Protected group bias and stereotypes in Large Language Models (2403.14727v1)
Abstract: As modern LLMs shatter many state-of-the-art benchmarks in a variety of domains, this paper investigates their behavior in the domains of ethics and fairness, focusing on protected group bias. We conduct a two-part study: first, we solicit sentence continuations describing the occupations of individuals from different protected groups, including gender, sexuality, religion, and race. Second, we have the model generate stories about individuals who hold different types of occupations. We collect >10k sentence completions made by a publicly available LLM, which we subject to human annotation. We find bias across minoritized groups, but in particular in the domains of gender and sexuality, as well as Western bias, in model generations. The model not only reflects societal biases, but appears to amplify them. The model is additionally overly cautious in replies to queries relating to minoritized groups, providing responses that strongly emphasize diversity and equity to an extent that other group characteristics are overshadowed. This suggests that artificially constraining potentially harmful outputs may itself lead to harm, and should be applied in a careful and controlled manner.
- Persistent Anti-Muslim Bias in Large Language Models. In Proceedings of the 2021 AAAI/ACM Conference on AI, Ethics, and Society (Virtual Event, USA) (AIES ’21). Association for Computing Machinery, New York, NY, USA, 298–306. https://doi.org/10.1145/3461702.3462624
- Ashley B Armstrong. 2023. Who’s Afraid of ChatGPT? An Examination of ChatGPT’s Implications for Legal Writing. An Examination of ChatGPT’s Implications for Legal Writing (January 23, 2023) (2023).
- A Multitask, Multilingual, Multimodal Evaluation of ChatGPT on Reasoning, Hallucination, and Interactivity. arXiv:2302.04023 [cs.CL]
- Solon Barocas and Andrew D. Selbst. 2016. Big Data’s Disparate Impact. , 671–732 pages. https://doi.org/10.15779/Z38BG31
- Evaluating the Underlying Gender Bias in Contextualized Word Embeddings. arXiv:1904.08783 [cs.CL]
- On the Dangers of Stochastic Parrots: Can Language Models Be Too Big? , 610–623 pages.
- The Role of AI in Drug Discovery: Challenges, Opportunities, and Strategies. arXiv preprint arXiv:2212.08104 (2022).
- Language (Technology) is Power: A Critical Survey of “Bias” in NLP. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, Online, 5454–5476. https://doi.org/10.18653/v1/2020.acl-main.485
- Stereotyping Norwegian Salmon: An Inventory of Pitfalls in Fairness Benchmark Datasets.
- Automation and Stock Prices: The Case of ChatGPT. Available at SSRN (2023).
- Man is to Computer Programmer as Woman is to Homemaker? Debiasing Word Embeddings. arXiv:1607.06520 [cs.CL]
- Speculative Futures on ChatGPT and Generative Artificial Intelligence (AI): A collective reflection from the educational landscape. Asian Journal of Distance Education 18, 1 (2023).
- Language Models are Few-Shot Learners. , 1877–1901 pages. arXiv:2005.14165 [cs.CL]
- Semantics derived automatically from language corpora contain human-like biases. Science 356, 6334 (2017), 183–186. https://doi.org/10.1126/science.aal4230
- Chatgpt goes to law school. Available at SSRN (2023).
- Deep reinforcement learning from human preferences. arXiv:1706.03741 [stat.ML]
- Kimberlé Crenshaw. 1989. Demarginalizing the intersection of race and sex: A black feminist critique of antidiscrimination doctrine, feminist theory and antiracist politics. u. Chi. Legal f. (1989), 139.
- Kimberlé Williams Crenshaw. 1991. Mapping the margins: intersectionality, identity politics, and violence against women of color. Stanford Law Review 43 (1991), 1241–1299.
- Michael Dowling and Brian Lucey. 2023. ChatGPT for (finance) research: The Bananarama conjecture. Finance Research Letters 53 (2023), 103662.
- An intersectional definition of fairness. In 2020 IEEE 36th International Conference on Data Engineering (ICDE). IEEE, 1918–1921.
- Word Embeddings Quantify 100 Years of Gender and Ethnic Stereotypes. arXiv:1711.08412 http://arxiv.org/abs/1711.08412
- Stereotypical Gender Effects in 2016. Presentation at CUNY Conference on Human Sentence Processing 30.
- Wei Guo and Aylin Caliskan. 2021. Detecting Emergent Intersectional Biases: Contextualized Word Embeddings Contain a Distribution of Human-like Biases. In Proceedings of the 2021 AAAI/ACM Conference on AI, Ethics, and Society. ACM. https://doi.org/10.1145/3461702.3462536
- Fairness Without Demographics in Repeated Loss Minimization. arXiv:1806.08010 [stat.ML]
- ChatGPT Makes Medicine Easy to Swallow: An Exploratory Case Study on Simplified Radiology Reports. arXiv preprint arXiv:2212.14882 (2022).
- Mitigating Gender Bias Amplification in Distribution by Posterior Regularization. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, Online, 2936–2942. https://doi.org/10.18653/v1/2020.acl-main.264
- S.M. Kennison and J.L. Trofe. 2003. Comprehending pronouns: A role for word-specific gender stereotype information. Journal of Psycholinguistic Research 32, 3 (2003), 355–378.
- Svetlana Kiritchenko and Saif M. Mohammad. 2018. Examining Gender and Race Bias in Two Hundred Sentiment Analysis Systems. arXiv:1805.04508 [cs.CL]
- Handling and Presenting Harmful Text in NLP Research. In Findings of the Association for Computational Linguistics: EMNLP 2022. Association for Computational Linguistics, Abu Dhabi, United Arab Emirates, 497–510. https://aclanthology.org/2022.findings-emnlp.35
- Bias Out-of-the-Box: An Empirical Analysis of Intersectional Occupational Biases in Popular Generative Language Models. arXiv:2102.04130 [cs.CL]
- William R Knight. 1966. A computer method for calculating Kendall’s tau with ungrouped data. J. Amer. Statist. Assoc. 61, 314 (1966), 436–439.
- Measuring Bias in Contextualized Word Representations. In Proceedings of the First Workshop on Gender Bias in Natural Language Processing. Association for Computational Linguistics, Florence, Italy, 166–172. https://doi.org/10.18653/v1/W19-3823
- Benchmarking Intersectional Biases in NLP. In Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. Association for Computational Linguistics, Seattle, United States, 3598–3609. https://doi.org/10.18653/v1/2022.naacl-main.263
- Feature-Wise Bias Amplification. arXiv:1812.08999 [cs.LG]
- Summary of ChatGPT/GPT-4 Research and Perspective Towards the Future of Large Language Models. arXiv:2304.01852 [cs.CL]
- Gender Bias in Neural Natural Language Processing. arXiv:1807.11714 [cs.CL]
- Intersectional Bias in Causal Language Models. arXiv:2107.07691 [cs.CL]
- On Measuring Social Biases in Sentence Encoders. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers). Association for Computational Linguistics, Minneapolis, Minnesota, 622–628. https://doi.org/10.18653/v1/N19-1063
- How Generative AI models such as ChatGPT can be (Mis)Used in SPC Practice, Education, and Research? An Exploratory Study. arXiv:2302.10916 [cs.LG]
- StereoSet: Measuring stereotypical bias in pretrained language models. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers). Association for Computational Linguistics, Online, 5356–5371. https://doi.org/10.18653/v1/2021.acl-long.416
- Putting ChatGPT’s Medical Advice to the (Turing) Test. medRxiv (2023), 2023–01.
- Pipelines for Social Bias Testing of Large Language Models. In Proceedings of BigScience Episode #5 – Workshop on Challenges & Perspectives in Creating Large Language Models. Association for Computational Linguistics, virtual+Dublin, 68–74. https://doi.org/10.18653/v1/2022.bigscience-1.6
- OpenAI. 2022. OpenAI: Introducing ChatGPT. https://openai.com/blog/chatgpt
- OpenAI. 2023. GPT-4 Technical Report. arXiv:2303.08774 [cs.CL]
- Probing Toxic Content in Large Pre-Trained Language Models. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers). Association for Computational Linguistics, Online, 4262–4274. https://doi.org/10.18653/v1/2021.acl-long.329
- Training language models to follow instructions with human feedback. arXiv:2203.02155 [cs.CL]
- Reducing Gender Bias in Abusive Language Detection. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, Brussels, Belgium, 2799–2804. https://doi.org/10.18653/v1/D18-1302
- Andrew M Perlman et al. 2022. The Implications of OpenAI’s Assistant for Legal Services and Society. Available at SSRN (2022).
- Tammy Pettinato Oltz. 2023. ChatGPT, Professor of Law. Professor of Law (February 4, 2023) (2023).
- Improving language understanding by generative pre-training. (2018).
- Better language models and their implications. OpenAI Blog https://openai. com/blog/better-language-models 1, 2 (2019).
- Language models are unsupervised multitask learners. OpenAI blog 1, 8 (2019), 9.
- Gender Bias in Coreference Resolution. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 2 (Short Papers). Association for Computational Linguistics, New Orleans, Louisiana, 8–14. https://doi.org/10.18653/v1/N18-2002
- Malik Sallam. 2023. The utility of ChatGPT as an example of large language models in healthcare education, research and practice: Systematic review on the future perspectives and potential limitations. medRxiv (2023), 2023–02.
- Social Bias Frames: Reasoning about Social and Power Implications of Language.
- The Woman Worked as a Babysitter: On Biases in Language Generation. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP). Association for Computational Linguistics, Hong Kong, China, 3407–3412. https://doi.org/10.18653/v1/D19-1339
- “I’m sorry to hear that”: Finding New Biases in Language Models with a Holistic Descriptor Dataset. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, Abu Dhabi, United Arab Emirates, 9180–9211. https://aclanthology.org/2022.emnlp-main.625
- Release Strategies and the Social Impacts of Language Models. arXiv:1908.09203 [cs.CL]
- Evaluating Gender Bias in Machine Translation. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, Florence, Italy, 1679–1684. https://doi.org/10.18653/v1/P19-1164
- Evaluating debiasing techniques for intersectional biases. arXiv preprint arXiv:2109.10441 (2021).
- Mitigating Gender Bias in Natural Language Processing: Literature Review. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, Florence, Italy, 1630–1640. https://doi.org/10.18653/v1/P19-1159
- What are the biases in my word embedding? arXiv:1812.08769 [cs.CL]
- You reap what you sow: On the Challenges of Bias Evaluation Under Multilingual Settings. https://openreview.net/forum?id=rK-7NhfSIW5
- Yi Chern Tan and L Elisa Celis. 2019. Assessing social and intersectional biases in contextualized word representations. Advances in neural information processing systems 32 (2019).
- US Labor Bureau of Statistics. 2022. Employed persons by detailed occupation, sex, race, and Hispanic or Latino ethnicity. Accessed May 13, 2023. https://www.bls.gov/cps/cpsaat11.htm.
- Getting Gender Right in Neural Machine Translation. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, Brussels, Belgium, 3003–3008. https://doi.org/10.18653/v1/D18-1334
- Nationality Bias in Text Generation. arXiv:2302.02463 [cs.CL]
- A Study of Implicit Bias in Pretrained Language Models against People with Disabilities. In Proceedings of the 29th International Conference on Computational Linguistics. International Committee on Computational Linguistics, Gyeongju, Republic of Korea, 1324–1332. https://aclanthology.org/2022.coling-1.113
- Bertie Vidgen and Leon Derczynski. 2020. Directions in Abusive Language Training Data: Garbage In, Garbage Out. CoRR abs/2004.01670 (2020). arXiv:2004.01670 https://arxiv.org/abs/2004.01670
- The Wall Street Neophyte: A Zero-Shot Analysis of ChatGPT Over MultiModal Stock Movement Prediction Challenges. arXiv preprint arXiv:2304.05351 (2023).
- Mitigating Unwanted Biases with Adversarial Learning. CoRR abs/1801.07593 (2018). arXiv:1801.07593 http://arxiv.org/abs/1801.07593
- Gender Bias in Contextualized Word Embeddings. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers). Association for Computational Linguistics, Minneapolis, Minnesota, 629–634. https://doi.org/10.18653/v1/N19-1064
- Men Also Like Shopping: Reducing Gender Bias Amplification using Corpus-level Constraints. arXiv:1707.09457 [cs.AI]
- Gender Bias in Coreference Resolution: Evaluation and Debiasing Methods. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 2 (Short Papers). Association for Computational Linguistics, New Orleans, Louisiana, 15–20. https://doi.org/10.18653/v1/N18-2003
- Learning Gender-Neutral Word Embeddings. arXiv:1809.01496 [cs.CL]
- Exploring AI Ethics of ChatGPT: A Diagnostic Analysis. arXiv:2301.12867 [cs.CL]
- Hadas Kotek (9 papers)
- David Q. Sun (6 papers)
- Zidi Xiu (7 papers)
- Margit Bowler (2 papers)
- Christopher Klein (11 papers)