The Johnson-Lindenstrauss Transform Itself Preserves Differential Privacy (1204.2136v2)

Published 10 Apr 2012 in cs.DS

Abstract: This paper proves that an "old dog", namely -- the classical Johnson-Lindenstrauss transform, "performs new tricks" -- it gives a novel way of preserving differential privacy. We show that if we take two databases, $D$ and $D'$, such that (i) $D'-D$ is a rank-1 matrix of bounded norm and (ii) all singular values of $D$ and $D'$ are sufficiently large, then multiplying either $D$ or $D'$ with a vector of iid normal Gaussians yields two statistically close distributions in the sense of differential privacy. Furthermore, a small, deterministic and \emph{public} alteration of the input is enough to assert that all singular values of $D$ are large. We apply the Johnson-Lindenstrauss transform to the task of approximating cut-queries: the number of edges crossing a $(S,\bar S)$-cut in a graph. We show that the JL transform allows us to \emph{publish a sanitized graph} that preserves edge differential privacy (where two graphs are neighbors if they differ on a single edge) while adding only $O(|S|/\epsilon)$ random noise to any given query (w.h.p). Comparing the additive noise of our algorithm to existing algorithms for answering cut-queries in a differentially private manner, we outperform all others on small cuts ($|S| = o(n)$). We also apply our technique to the task of estimating the variance of a given matrix in any given direction. The JL transform allows us to \emph{publish a sanitized covariance matrix} that preserves differential privacy w.r.t bounded changes (each row in the matrix can change by at most a norm-1 vector) while adding random noise of magnitude independent of the size of the matrix (w.h.p). In contrast, existing algorithms introduce an error which depends on the matrix dimensions.

Authors (4)

Jeremiah Blocki (48 papers)
Avrim Blum (70 papers)
Anupam Datta (51 papers)
Or Sheffet (24 papers)

Citations (229)

View on Semantic Scholar

Summary

The paper shows that applying JL transforms to matrices with i.i.d. Gaussian entries preserves differential privacy under ranked perturbations.
It presents algorithms for graph cut queries and covariance estimation that add significantly less noise compared to traditional methods.
The results bridge dimensionality reduction and privacy, inspiring new approaches to integrate privacy preservation in data analysis.

Overview of "The Johnson-Lindenstrauss Transform Itself Preserves Differential Privacy"

The paper by Blocki et al. explores a novel application of the classic Johnson-Lindenstrauss (JL) transform within the context of differential privacy, extending its utility beyond traditional applications such as dimensionality reduction, graph embeddings, and compressed sensing. The paper demonstrates that the JL transform can inherently preserve differential privacy when applied to database queries and data-driven outputs.

Key Contributions and Results

The authors establish that by employing the JL transform on matrices derived from datasets, it is possible to maintain differential privacy guarantees under certain conditions. The paper presents theorems showing that if two databases, $D$ and $D'$ , differ by a rank-1 matrix of bounded norm, and all singular values are sufficiently large, then multiplying these databases by vectors of i.i.d. Gaussian entries results in statistically close distributions as defined by differential privacy. This is significant because it identifies the JL transform itself as a primitive operation that can preserve privacy without additional mechanism designs or adjustments.

The paper further explores this result through applications in graph cut queries and covariance estimation:

Graph Cut Queries: By using the JL transform, the authors put forth an algorithm that can publish sanitized graphs retaining edge differential privacy. For small cuts, the noise added by their algorithm is $O(|S|/\epsilon)$ , demonstrating improved performance over existing methods particularly for small sets $S$ .
Covariance Estimation: The authors apply the JL technique to estimating the variance of matrices in arbitrary directions, allowing for the publication of sanitized covariance matrices while maintaining privacy. Notably, the noise added by this method does not scale with the size of the matrix, which is a significant advantage over traditional approaches that introduce error proportional to the dimensions of the matrix.

Theoretical and Practical Implications

From a theoretical standpoint, this paper bridges the gap between dimensionality reduction and privacy preservation in data analysis. The identified link between the JL transform and differential privacy suggests novel pathways for integrating privacy guarantees in algorithms that already utilize JL embeddings, removing the need for additional privacy-specific alterations.

Practically, the results imply potential efficiences in handling data privacy without compromising the output's utility. Particularly in scenarios where data dimensionality reduction or rapid estimation of certain properties is essential, the JL transform offers a method to do so while inherently ensuring that privacy constraints are respected. This has ramifications for a variety of fields where sensitivity and privacy of data are of concern, including social network analysis, machine learning, and statistical data analysis.

Future Directions

The insights provided in this paper open avenues for further exploration of various transforms and their inherent privacy-preserving capabilities. The implication that classical transforms can be used to ensure differential privacy could alter how algorithms are structured in privacy-sensitive environments. Investigation into other transformations and their interactions with privacy could yield similarly useful results, offering alternatives to the current noise-addition techniques prevalent in privacy-preserving data analysis.

There remains potential to further refine the analysis and applications of the JL transform in the context of privacy, perhaps by developing even more efficient versions of the transform or overcoming some limitations of dimensionality bounds. Additionally, evaluating the adaptability of this approach for non-Gaussian distributions or other variations of data perturbation could be beneficial.

In conclusion, this work enhances the toolkit available for privacy-preserving data analysis by embedding it into widely-utilized transformations, thus facilitating broader and more efficient applications.

PDF Markdown