What's my age?: Predicting Twitter User's Age using Influential Friend Network and DBpedia

Published 10 Apr 2018 in cs.SI | (1804.03362v1)

Abstract: Social media is a rich source of user behavior and opinions. Twitter senses nearly 500 million tweets per day from 328 million users.An appropriate machine learning pipeline over this information enables up-to-date and cost-effective data collection for a wide variety of domains such as; social science, public health, the wisdom of the crowd, etc. In many of the domains, users demographic information is key to the identification of segments of the populations being studied. For instance, Which age groups are observed to abuse which drugs?, Which ethnicities are most affected by depression per location?. Twitter in its current state does not require users to provide any demographic information. We propose to create a machine learning system coupled with the DBpedia graph that predicts the most probable age of the Twitter user. In our process to build an age prediction model using social media text and user meta-data, we explore the existing state of the art approaches. Detailing our data collection, feature engineering cycle, model selection and evaluation pipeline, we will exhibit the efficacy of our approach by comparing with the "predict mean" age estimator baseline.