Papers
Topics
Authors
Recent
Assistant
AI Research Assistant
Well-researched responses based on relevant abstracts and paper content.
Custom Instructions Pro
Preferences or requirements that you'd like Emergent Mind to consider when generating responses.
Gemini 2.5 Flash
Gemini 2.5 Flash 134 tok/s
Gemini 2.5 Pro 41 tok/s Pro
GPT-5 Medium 21 tok/s Pro
GPT-5 High 25 tok/s Pro
GPT-4o 92 tok/s Pro
Kimi K2 196 tok/s Pro
GPT OSS 120B 431 tok/s Pro
Claude Sonnet 4.5 37 tok/s Pro
2000 character limit reached

De-amplifying Bias from Differential Privacy in Language Model Fine-tuning (2402.04489v1)

Published 7 Feb 2024 in cs.LG, cs.CR, cs.CY, and stat.ME

Abstract: Fairness and privacy are two important values ML practitioners often seek to operationalize in models. Fairness aims to reduce model bias for social/demographic sub-groups. Privacy via differential privacy (DP) mechanisms, on the other hand, limits the impact of any individual's training data on the resulting model. The trade-offs between privacy and fairness goals of trustworthy ML pose a challenge to those wishing to address both. We show that DP amplifies gender, racial, and religious bias when fine-tuning LLMs, producing models more biased than ones fine-tuned without DP. We find the cause of the amplification to be a disparity in convergence of gradients across sub-groups. Through the case of binary gender bias, we demonstrate that Counterfactual Data Augmentation (CDA), a known method for addressing bias, also mitigates bias amplification by DP. As a consequence, DP and CDA together can be used to fine-tune models while maintaining both fairness and privacy.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (44)
  1. Deep Learning with Differential Privacy. In Proceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security. ACM.
  2. Persistent Anti-Muslim Bias in Large Language Models. CoRR, abs/2101.05783.
  3. Differential Privacy Has Disparate Impact on Model Accuracy. CoRR, abs/1905.12101.
  4. A Drop of Ink Makes a Million Think: The Spread of False Information in Large Language Models. arXiv:2305.04812.
  5. Man is to Computer Programmer as Woman is to Homemaker? Debiasing Word Embeddings. CoRR, abs/1607.06520.
  6. Language Models are Few-Shot Learners. arXiv:2005.14165.
  7. Extracting Training Data from Large Language Models. CoRR, abs/2012.07805.
  8. BOLD: Dataset and Metrics for Measuring Biases in Open-Ended Language Generation. In Proceedings of the 2021 ACM Conference on Fairness, Accountability, and Transparency, FAccT ’21, 862–872. New York, NY, USA: Association for Computing Machinery. ISBN 9781450383097.
  9. Semantics derived automatically from language corpora necessarily contain human biases. CoRR, abs/1608.07187.
  10. Adam: A Method for Stochastic Optimization. arXiv:1412.6980.
  11. Bias Out-of-the-Box: An Empirical Analysis of Intersectional Occupational Biases in Popular Generative Language Models. arXiv:2102.04130.
  12. Fair Decision Making using Privacy-Protected Data. arXiv:1905.12744.
  13. When Does Differentially Private Learning Not Suffer in High Dimensions? In Oh, A. H.; Agarwal, A.; Belgrave, D.; and Cho, K., eds., Advances in Neural Information Processing Systems.
  14. Large Language Models Can Be Strong Differentially Private Learners. In International Conference on Learning Representations.
  15. Li, Z. 2023. The Dark Side of ChatGPT: Legal and Ethical Challenges from Stochastic Parrots and Hallucination. arXiv:2304.14347.
  16. Holistic Evaluation of Language Models. arXiv:2211.09110.
  17. Towards Understanding and Mitigating Social Biases in Language Models. CoRR, abs/2106.13219.
  18. Gender Bias in Neural Natural Language Processing. CoRR, abs/1807.11714.
  19. On Measuring Social Biases in Sentence Encoders. arXiv:1903.10561.
  20. Pointer Sentinel Mixture Models. arXiv:1609.07843.
  21. StereoSet: Measuring stereotypical bias in pretrained language models. arXiv preprint arXiv:2004.09456.
  22. Nationality Bias in Text Generation. In Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics, 116–122. Dubrovnik, Croatia: Association for Computational Linguistics.
  23. ”HONEST: Measuring Hurtful Sentence Completion in Language Models”. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2398–2406. Online: Association for Computational Linguistics.
  24. OpenAI. 2023. GPT-4 Technical Report. arXiv:2303.08774.
  25. Counterfactual Data Augmentation using Locally Factored Dynamics. arXiv:2007.02863.
  26. Language Models are Unsupervised Multitask Learners.
  27. Counterfactual Data Augmentation improves Factuality of Abstractive Summarization. arXiv:2205.12416.
  28. Rethinking Counterfactual Data Augmentation Under Confounding. arXiv:2305.18183.
  29. DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter. In NeurIPS EMC22{}^{2}start_FLOATSUPERSCRIPT 2 end_FLOATSUPERSCRIPT Workshop.
  30. An Analysis Of Protected Health Information Leakage In Deep-Learning Based De-Identification Algorithms. arXiv:2101.12099.
  31. The Woman Worked as a Babysitter: On Biases in Language Generation. CoRR, abs/1909.01326.
  32. Selective Differential Privacy for Language Modeling. arXiv:2108.12944.
  33. Just Fine-tune Twice: Selective Differential Privacy for Large Language Models. arXiv:2204.07667.
  34. Towards a Comprehensive Understanding and Accurate Evaluation of Societal Biases in Pre-Trained Transformers. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2383–2389. Online: Association for Computational Linguistics.
  35. Mitigating Gender Bias in Natural Language Processing: Literature Review. arXiv:1906.08976.
  36. Data Leakage in Tabular Federated Learning. arXiv:2210.01785.
  37. Overcoming Bias in Pretrained Models by Manipulating the Finetuning Dataset. arXiv:2303.06167.
  38. Ethical and social risks of harm from Language Models. CoRR, abs/2112.04359.
  39. HuggingFace’s Transformers: State-of-the-art Natural Language Processing. arXiv:1910.03771.
  40. BLOOM: A 176B-Parameter Open-Access Multilingual Language Model. arXiv:2211.05100.
  41. Opacus: User-Friendly Differential Privacy Library in PyTorch. arXiv preprint arXiv:2109.12298.
  42. Gender Bias in Coreference Resolution: Evaluation and Debiasing Methods. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 2 (Short Papers), 15–20. New Orleans, Louisiana: Association for Computational Linguistics.
  43. A Survey of Large Language Models. arXiv:2303.18223.
  44. A Comprehensive Survey on Pretrained Foundation Models: A History from BERT to ChatGPT. arXiv:2302.09419.
Citations (1)

Summary

We haven't generated a summary for this paper yet.

Dice Question Streamline Icon: https://streamlinehq.com

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Lightbulb Streamline Icon: https://streamlinehq.com

Continue Learning

We haven't generated follow-up questions for this paper yet.

List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.

Don't miss out on important new AI/ML research

See which papers are being discussed right now on X, Reddit, and more:

“Emergent Mind helps me see which AI papers have caught fire online.”

Philip

Philip

Creator, AI Explained on YouTube