Zeroth-Order Stochastic Variance Reduction for Nonconvex Optimization (1805.10367v2)

Published 25 May 2018 in cs.LG and stat.ML

Abstract: As application demands for zeroth-order (gradient-free) optimization accelerate, the need for variance reduced and faster converging approaches is also intensifying. This paper addresses these challenges by presenting: a) a comprehensive theoretical analysis of variance reduced zeroth-order (ZO) optimization, b) a novel variance reduced ZO algorithm, called ZO-SVRG, and c) an experimental evaluation of our approach in the context of two compelling applications, black-box chemical material classification and generation of adversarial examples from black-box deep neural network models. Our theoretical analysis uncovers an essential difficulty in the analysis of ZO-SVRG: the unbiased assumption on gradient estimates no longer holds. We prove that compared to its first-order counterpart, ZO-SVRG with a two-point random gradient estimator could suffer an additional error of order $O(1/b)$, where $b$ is the mini-batch size. To mitigate this error, we propose two accelerated versions of ZO-SVRG utilizing variance reduced gradient estimators, which achieve the best rate known for ZO stochastic optimization (in terms of iterations). Our extensive experimental results show that our approaches outperform other state-of-the-art ZO algorithms, and strike a balance between the convergence rate and the function query complexity.

Citations (161)

View on Semantic Scholar

Summary

The paper introduces ZO-SVRG, a novel variance-reduced zeroth-order algorithm designed to enhance convergence speed in nonconvex optimization problems where gradients are inaccessible.
Theoretical analysis shows ZO-SVRG with a two-point estimator achieves an O(1/T) convergence rate but identifies an O(1/b) error term from gradient estimation, leading to proposed accelerated variants.
Empirical validation on tasks like chemical classification and adversarial example generation demonstrates that ZO-SVRG and its variants outperform existing state-of-the-art zeroth-order methods.

Overview of Zeroth-Order Stochastic Variance Reduction for Nonconvex Optimization

The paper entitled "Zeroth-Order Stochastic Variance Reduction for Nonconvex Optimization" focuses on addressing challenges related to zeroth-order (ZO) optimization, particularly in the context of machine learning problems where explicit gradient expressions are difficult to obtain. The research introduces a novel variance-reduced ZO algorithm, ZO-SVRG, which aims to enhance convergence speeds while managing the inherent high variance in ZO gradient estimates.

ZO-SVRG Algorithm and Theoretical Analysis

The core innovation in the paper is the ZO-SVRG algorithm. Unlike its first-order counterpart, SVRG, the ZO-SVRG algorithm operates without direct gradient information, relying instead on gradient estimators derived from function values. A significant theoretical contribution of this work is the identification of an additional error term $O(1/b)$ introduced by the unbiased assumption breakdown when using gradient estimators. The authors provide rigorous analysis details, demonstrating that the ZO-SVRG with a two-point random gradient estimator achieves a convergence rate of $O(1/T)$ , albeit with the aforementioned error component linked to mini-batch size $b$ .

To counteract the observed errors, the paper proposes two accelerated variants of ZO-SVRG, which employ variance-reduced gradient estimators. These enhancements target achieving the best-known iteration complexity for ZO stochastic optimization, showcasing substantial theoretical advancements in managing the trade-off between convergence rate and function query complexity.

Experimental Validation

The empirical evaluations underscore the practical effectiveness of the proposed methods. The experiments cover diverse applications, including black-box chemical material classification and adversarial example generation for DNNs. In these scenarios, ZO-SVRG and its accelerated variants outperform existing state-of-the-art ZO algorithms. Particularly, with suitable choices of parameter settings, the proposed algorithms demonstrate improved convergence rates, balancing between iteration complexity and function query overheads effectively.

Implications and Future Prospects

The implications of this research are extensive, impacting both practical applications and theoretical advancements in ZO optimization. Practically, the enhanced ZO-SVRG algorithms can be applied to complex machine learning tasks where gradient information is inaccessible or computationally expensive to obtain. Theoretically, the introduction of error mitigation strategies paves the way for improved algorithm designs that extend first-order optimization techniques to the more challenging ZO domain.

Further developments in this area may focus on enhancing the scalability of ZO algorithms for large-scale machine learning problems and exploring additional methods to minimize function query complexity without sacrificing convergence efficiency. Such endeavors will likely contribute significantly to the broader field of nonconvex optimization, enabling more robust and efficient solutions across a wide range of AI applications.