Gender Bias in Big Data Analysis (2211.09865v1)
Abstract: This article combines humanistic "data critique" with informed inspection of big data analysis. It measures gender bias when gender prediction software tools (Gender API, Namsor, and Genderize.io) are used in historical big data research. Gender bias is measured by contrasting personally identified computer science authors in the well-regarded DBLP dataset (1950-1980) with exactly comparable results from the software tools. Implications for public understanding of gender bias in computing and the nature of the computing profession are outlined. Preliminary assessment of the Semantic Scholar dataset is presented. The conclusion combines humanistic approaches with selective use of big data methods.
Collections
Sign up for free to add this paper to one or more collections.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.