Emergent Mind

Abstract

Differentially Private Stochastic Gradient Descent (DPSGD) is widely utilized to preserve training data privacy in deep learning, which first clips the gradients to a predefined norm and then injects calibrated noise into the training procedure. Existing DPSGD works typically assume the gradients follow sub-Gaussian distributions and design various clipping mechanisms to optimize training performance. However, recent studies have shown that the gradients in deep learning exhibit a heavy-tail phenomenon, that is, the tails of the gradient have infinite variance, which may lead to excessive clipping loss to the gradients with existing DPSGD mechanisms. To address this problem, we propose a novel approach, Discriminative Clipping~(DC)-DPSGD, with two key designs. First, we introduce a subspace identification technique to distinguish between body and tail gradients. Second, we present a discriminative clipping mechanism that applies different clipping thresholds for body and tail gradients to reduce the clipping loss. Under the non-convex condition, \ourtech{} reduces the empirical gradient norm from {${\mathbb{O}\left(\log{\max(0,\theta-1)}(T/\delta)\log{2\theta}(\sqrt{T})\right)}$} to {${\mathbb{O}\left(\log(\sqrt{T})\right)}$} with heavy-tailed index $\theta\geq 1/2$, iterations $T$, and arbitrary probability $\delta$. Extensive experiments on four real-world datasets demonstrate that our approach outperforms three baselines by up to 9.72\% in terms of accuracy.

We're not able to analyze this paper right now due to high demand.

Please check back later (sorry!).

Generate a summary of this paper on our Pro plan:

We ran into a problem analyzing this paper.

Newsletter

Get summaries of trending comp sci papers delivered straight to your inbox:

Unsubscribe anytime.