Semantics-Preserved Distortion for Personal Privacy Protection in Information Management (2201.00965v3)

Published 4 Jan 2022 in cs.CL

Abstract: In recent years, machine learning - particularly deep learning - has significantly impacted the field of information management. While several strategies have been proposed to restrict models from learning and memorizing sensitive information from raw texts, this paper suggests a more linguistically-grounded approach to distort texts while maintaining semantic integrity. To this end, we leverage Neighboring Distribution Divergence, a novel metric to assess the preservation of semantic meaning during distortion. Building on this metric, we present two distinct frameworks for semantic-preserving distortion: a generative approach and a substitutive approach. Our evaluations across various tasks, including named entity recognition, constituency parsing, and machine reading comprehension, affirm the plausibility and efficacy of our distortion technique in personal privacy protection. We also test our method against attribute attacks in three privacy-focused assignments within the NLP domain, and the findings underscore the simplicity and efficacy of our data-based improvement approach over structural improvement approaches. Moreover, we explore privacy protection in a specific medical information management scenario, showing our method effectively limits sensitive data memorization, underscoring its practicality.

Citations (1)

View on Semantic Scholar

Summary

We haven't generated a summary for this paper yet.

Summarize Now

Semantics-Preserved Distortion for Personal Privacy Protection in Information Management (2201.00965v3)

Summary

Related Papers