The Secret Sharer: Evaluating and Testing Unintended Memorization in Neural Networks (1802.08232v3)

Published 22 Feb 2018 in cs.LG, cs.AI, and cs.CR

Abstract: This paper describes a testing methodology for quantitatively assessing the risk that rare or unique training-data sequences are unintentionally memorized by generative sequence models---a common type of machine-learning model. Because such models are sometimes trained on sensitive data (e.g., the text of users' private messages), this methodology can benefit privacy by allowing deep-learning practitioners to select means of training that minimize such memorization. In experiments, we show that unintended memorization is a persistent, hard-to-avoid issue that can have serious consequences. Specifically, for models trained without consideration of memorization, we describe new, efficient procedures that can extract unique, secret sequences, such as credit card numbers. We show that our testing strategy is a practical and easy-to-use first line of defense, e.g., by describing its application to quantitatively limit data exposure in Google's Smart Compose, a commercial text-completion neural network trained on millions of users' email messages.

Citations (1,018)

View on Semantic Scholar

Summary

The paper presents an exposure metric that quantitatively assesses the risk of unintended memorization of rare training sequences.
It demonstrates through extensive experiments that memorization begins early during training and persists despite conventional regularization techniques.
It proposes differentially private training methods as an effective safeguard to reduce the risk of extracting sensitive data.

The Secret Sharer: Evaluating and Testing Unintended Memorization in Neural Networks

The paper "The Secret Sharer: Evaluating and Testing Unintended Memorization in Neural Networks" addresses a significant concern in the field of machine learning: the unintended memorization of rare or unique training-data sequences by generative sequence models. This concern bears relevance especially in contexts where models are trained on sensitive datasets, such as user-generated messages. The paper proposes a robust testing methodology to quantify the risk of such unintended memorization and outlines experimental methods to mitigate these risks.

Core Contributions and Findings

The key contributions of the paper can be summarized as follows:

Exposure Metric: The authors introduce a quantitative metric for exposure that measures unintended memorization. This metric evaluates whether certain rare or unique sequences from the training data can be predicted with significantly higher probability than other sequences.
Empirical Analysis: Through a series of experiments, the paper demonstrates that unintended memorization is a pervasive and persistent issue that occurs early during training and persists despite various regularization techniques.
Extractability of Memorized Sequences: By employing new and efficient algorithms, the authors show that it is possible to extract unique secret sequences (e.g., credit card numbers, social security numbers) from trained models.
Differential Privacy as a Solution: The paper finds that common regularization techniques, such as early stopping and dropout, are insufficient to curb unintended memorization. However, differentially private training techniques can effectively mitigate this issue.

Testing Methodology

The methodological rigor underlying the proposed exposure metric involves the following steps:

Canary Insertion: Random sequences (termed as canaries) are inserted into the training data a varying number of times.
Training Models: Models are trained in the usual manner, taking care to maintain consistent hyperparameters and training strategies.
Calculating Exposure: The exposure metric is computed by evaluating the log-perplexity of the canaries compared to other sequences not present in the training data.
Evaluating Memorization: Exposure levels are analyzed to gauge the extent of memorization, with higher exposure values indicating a higher likelihood that the model has memorized the canaries.

Experimental Results and Insights

Persistency of Unintended Memorization

The paper shows that unintended memorization is not simply a product of overtraining. By training models on portions of data and measuring exposure across varying training stages, the findings reveal that memorization begins early and stabilizes regardless of further training. For instance, a LLM trained on the Penn Treebank dataset was able to consistently memorize artificially inserted social security numbers from only a few occurrences in the training data.

Production-Scale Evaluation

The research extends to a large-scale, real-world application—Google’s Smart Compose. Through evaluation, it was shown that even with canaries inserted up to 10,000 times, the exposure values, while elevated, were not sufficient for extraction via naive search methods. This finding emphasizes the need for stringent mechanisms to protect against possible leakage of sensitive data.

Validating the Exposure Metric with Extractability

To validate the effectiveness of the exposure metric, an extraction algorithm based on Dijkstra’s shortest-path search was developed. This algorithm proved that sequences with high exposure could be efficiently extracted, thereby confirming that high exposure is a reliable indicator of memorized data.

Practical and Theoretical Implications

The implications of this research are multifaceted:

Privacy Concerns: Models trained on sensitive data without considering memorization risks pose significant privacy threats.
Best Practices in Model Training: Differentially private training methods should be adopted more widely to prevent unintended memorization.
Future Research Directions: Additional studies could investigate alternative model types (such as image classifiers) and further refine the exposure metric.

The findings from this research underscore the significance of integrating privacy-aware training methodologies in machine learning workflows. By highlighting how differential privacy techniques can nearly eliminate the issue of unintended memorization, the paper sets a foundation for future work aimed at safeguarding user privacy in machine-learned models. Further exploration will be necessary to generalize these findings across different types of neural networks and datasets.

Related Papers

Tweets

https://twitter.com/vitalyFM/status/1930107160597541027

https://twitter.com/HumanLevelJen/status/1760538459981099492