A General Framework for Data-Use Auditing of ML Models (2407.15100v3)

Published 21 Jul 2024 in cs.CR and cs.LG

Abstract: Auditing the use of data in training machine-learning (ML) models is an increasingly pressing challenge, as myriad ML practitioners routinely leverage the effort of content creators to train models without their permission. In this paper, we propose a general method to audit an ML model for the use of a data-owner's data in training, without prior knowledge of the ML task for which the data might be used. Our method leverages any existing black-box membership inference method, together with a sequential hypothesis test of our own design, to detect data use with a quantifiable, tunable false-detection rate. We show the effectiveness of our proposed framework by applying it to audit data use in two types of ML models, namely image classifiers and foundation models.

Citations (2)

View on Semantic Scholar

Summary

We haven't generated a summary for this paper yet.

Summarize Now

Tweets

https://twitter.com/NeilGong/status/1825525244662096250

A General Framework for Data-Use Auditing of ML Models (2407.15100v3)

Summary

Related Papers

Tweets