A Non-Parametric Test to Detect Data-Copying in Generative Models (2004.05675v1)

Published 12 Apr 2020 in cs.LG and stat.ML

Abstract: Detecting overfitting in generative models is an important challenge in machine learning. In this work, we formalize a form of overfitting that we call {\em{data-copying}} -- where the generative model memorizes and outputs training samples or small variations thereof. We provide a three sample non-parametric test for detecting data-copying that uses the training set, a separate sample from the target distribution, and a generated sample from the model, and study the performance of our test on several canonical models and datasets. For code & examples, visit https://github.com/casey-meehan/data-copying

Citations (53)

View on Semantic Scholar

Summary

We haven't generated a summary for this paper yet.

Summarize Now

GitHub

GitHub - casey-meehan/data-copying: companion git repository to data-copying paper by Meehan, Chaudhuri, Dasgupta in AISTATS 2020 (13 stars)

A Non-Parametric Test to Detect Data-Copying in Generative Models (2004.05675v1)

Summary

Related Papers

GitHub