On the Ground Validation of Online Diagnosis with Twitter and Medical Records (1404.3026v1)
Abstract: Social media has been considered as a data source for tracking disease. However, most analyses are based on models that prioritize strong correlation with population-level disease rates over determining whether or not specific individual users are actually sick. Taking a different approach, we develop a novel system for social-media based disease detection at the individual level using a sample of professionally diagnosed individuals. Specifically, we develop a system for making an accurate influenza diagnosis based on an individual's publicly available Twitter data. We find that about half (17/35 = 48.57%) of the users in our sample that were sick explicitly discuss their disease on Twitter. By developing a meta classifier that combines text analysis, anomaly detection, and social network analysis, we are able to diagnose an individual with greater than 99% accuracy even if she does not discuss her health.
- Todd Bodnar (2 papers)
- Victoria C Barclay (1 paper)
- Nilam Ram (6 papers)
- Marcel Salathé (27 papers)
- Conrad S Tucker (1 paper)