De-amplifying Bias from Differential Privacy in Language Model Fine-tuning (2402.04489v1)
Abstract: Fairness and privacy are two important values ML practitioners often seek to operationalize in models. Fairness aims to reduce model bias for social/demographic sub-groups. Privacy via differential privacy (DP) mechanisms, on the other hand, limits the impact of any individual's training data on the resulting model. The trade-offs between privacy and fairness goals of trustworthy ML pose a challenge to those wishing to address both. We show that DP amplifies gender, racial, and religious bias when fine-tuning LLMs, producing models more biased than ones fine-tuned without DP. We find the cause of the amplification to be a disparity in convergence of gradients across sub-groups. Through the case of binary gender bias, we demonstrate that Counterfactual Data Augmentation (CDA), a known method for addressing bias, also mitigates bias amplification by DP. As a consequence, DP and CDA together can be used to fine-tune models while maintaining both fairness and privacy.
- Deep Learning with Differential Privacy. In Proceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security. ACM.
- Persistent Anti-Muslim Bias in Large Language Models. CoRR, abs/2101.05783.
- Differential Privacy Has Disparate Impact on Model Accuracy. CoRR, abs/1905.12101.
- A Drop of Ink Makes a Million Think: The Spread of False Information in Large Language Models. arXiv:2305.04812.
- Man is to Computer Programmer as Woman is to Homemaker? Debiasing Word Embeddings. CoRR, abs/1607.06520.
- Language Models are Few-Shot Learners. arXiv:2005.14165.
- Extracting Training Data from Large Language Models. CoRR, abs/2012.07805.
- BOLD: Dataset and Metrics for Measuring Biases in Open-Ended Language Generation. In Proceedings of the 2021 ACM Conference on Fairness, Accountability, and Transparency, FAccT ’21, 862–872. New York, NY, USA: Association for Computing Machinery. ISBN 9781450383097.
- Semantics derived automatically from language corpora necessarily contain human biases. CoRR, abs/1608.07187.
- Adam: A Method for Stochastic Optimization. arXiv:1412.6980.
- Bias Out-of-the-Box: An Empirical Analysis of Intersectional Occupational Biases in Popular Generative Language Models. arXiv:2102.04130.
- Fair Decision Making using Privacy-Protected Data. arXiv:1905.12744.
- When Does Differentially Private Learning Not Suffer in High Dimensions? In Oh, A. H.; Agarwal, A.; Belgrave, D.; and Cho, K., eds., Advances in Neural Information Processing Systems.
- Large Language Models Can Be Strong Differentially Private Learners. In International Conference on Learning Representations.
- Li, Z. 2023. The Dark Side of ChatGPT: Legal and Ethical Challenges from Stochastic Parrots and Hallucination. arXiv:2304.14347.
- Holistic Evaluation of Language Models. arXiv:2211.09110.
- Towards Understanding and Mitigating Social Biases in Language Models. CoRR, abs/2106.13219.
- Gender Bias in Neural Natural Language Processing. CoRR, abs/1807.11714.
- On Measuring Social Biases in Sentence Encoders. arXiv:1903.10561.
- Pointer Sentinel Mixture Models. arXiv:1609.07843.
- StereoSet: Measuring stereotypical bias in pretrained language models. arXiv preprint arXiv:2004.09456.
- Nationality Bias in Text Generation. In Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics, 116–122. Dubrovnik, Croatia: Association for Computational Linguistics.
- ”HONEST: Measuring Hurtful Sentence Completion in Language Models”. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2398–2406. Online: Association for Computational Linguistics.
- OpenAI. 2023. GPT-4 Technical Report. arXiv:2303.08774.
- Counterfactual Data Augmentation using Locally Factored Dynamics. arXiv:2007.02863.
- Language Models are Unsupervised Multitask Learners.
- Counterfactual Data Augmentation improves Factuality of Abstractive Summarization. arXiv:2205.12416.
- Rethinking Counterfactual Data Augmentation Under Confounding. arXiv:2305.18183.
- DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter. In NeurIPS EMC22{}^{2}start_FLOATSUPERSCRIPT 2 end_FLOATSUPERSCRIPT Workshop.
- An Analysis Of Protected Health Information Leakage In Deep-Learning Based De-Identification Algorithms. arXiv:2101.12099.
- The Woman Worked as a Babysitter: On Biases in Language Generation. CoRR, abs/1909.01326.
- Selective Differential Privacy for Language Modeling. arXiv:2108.12944.
- Just Fine-tune Twice: Selective Differential Privacy for Large Language Models. arXiv:2204.07667.
- Towards a Comprehensive Understanding and Accurate Evaluation of Societal Biases in Pre-Trained Transformers. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2383–2389. Online: Association for Computational Linguistics.
- Mitigating Gender Bias in Natural Language Processing: Literature Review. arXiv:1906.08976.
- Data Leakage in Tabular Federated Learning. arXiv:2210.01785.
- Overcoming Bias in Pretrained Models by Manipulating the Finetuning Dataset. arXiv:2303.06167.
- Ethical and social risks of harm from Language Models. CoRR, abs/2112.04359.
- HuggingFace’s Transformers: State-of-the-art Natural Language Processing. arXiv:1910.03771.
- BLOOM: A 176B-Parameter Open-Access Multilingual Language Model. arXiv:2211.05100.
- Opacus: User-Friendly Differential Privacy Library in PyTorch. arXiv preprint arXiv:2109.12298.
- Gender Bias in Coreference Resolution: Evaluation and Debiasing Methods. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 2 (Short Papers), 15–20. New Orleans, Louisiana: Association for Computational Linguistics.
- A Survey of Large Language Models. arXiv:2303.18223.
- A Comprehensive Survey on Pretrained Foundation Models: A History from BERT to ChatGPT. arXiv:2302.09419.
- Sanjari Srivastava (3 papers)
- Piotr Mardziel (18 papers)
- Zhikhun Zhang (1 paper)
- Archana Ahlawat (2 papers)
- Anupam Datta (51 papers)
- John C Mitchell (10 papers)