Automatically Identifying Fake News in Popular Twitter Threads (1705.01613v2)

Published 3 May 2017 in cs.SI

Abstract: Information quality in social media is an increasingly important issue, but web-scale data hinders experts' ability to assess and correct much of the inaccurate content, or `fake news,' present in these platforms. This paper develops a method for automating fake news detection on Twitter by learning to predict accuracy assessments in two credibility-focused Twitter datasets: CREDBANK, a crowdsourced dataset of accuracy assessments for events in Twitter, and PHEME, a dataset of potential rumors in Twitter and journalistic assessments of their accuracies. We apply this method to Twitter content sourced from BuzzFeed's fake news dataset and show models trained against crowdsourced workers outperform models based on journalists' assessment and models trained on a pooled dataset of both crowdsourced workers and journalists. All three datasets, aligned into a uniform format, are also publicly available. A feature analysis then identifies features that are most predictive for crowdsourced and journalistic accuracy assessments, results of which are consistent with prior work. We close with a discussion contrasting accuracy and credibility and why models of non-experts outperform models of journalists for fake news detection in Twitter.

Citations (213)

View on Semantic Scholar

Summary

The paper demonstrates that models trained on crowdsourced CREDBANK data achieve a 65.29% success rate in identifying fake news.
It employs a machine learning approach with extensive feature selection to distinguish between user-perceived credibility and factual accuracy.
The study highlights the potential of leveraging crowd intelligence for scalable, real-time misinformation detection on social media.

Automated Detection of Fake News in Twitter Threads

The paper "Automatically Identifying Fake News in Popular Twitter Threads" presents a methodological approach to detecting misinformation on social media, specifically within Twitter threads. Acknowledging the growing challenge of evaluating information credibility amidst overwhelming volumes of data, the authors propose a system for automating the identification of fake news, leveraging machine learning to predict the accuracy of Twitter topics.

Methodology Overview

The authors utilize two main datasets: CREDBANK, which is crowdsourced and provides accuracy assessments for Twitter events, and PHEME, which is curated by journalists focusing on potential rumors. The study seeks to automate accuracy predictions by training models on these datasets and then applying them to a set of Twitter data derived from BuzzFeed's fake news dataset. The models developed from crowdsourced data consistently outperform those relying on journalistic evaluations, thereby highlighting a critical insight into user perceptions of credibility versus factual accuracy.

Key Findings

The study's results reveal several important dimensions regarding fake news detection. The following are among the noteworthy observations:

Model Performance: Models trained on CREDBANK outperformed those trained on PHEME when applied to detecting fake news within the BuzzFeed sample. Specifically, the crowdsourced-driven model successfully classified 65.29% of fake news cases, showcasing superior performance over journalism-based assessments.
Feature Analysis: An extensive feature selection process identified distinct feature sets as significant within each dataset. While some features like the usage of media or hashtags were common to both datasets, the strongest predictors differed, suggesting variant evaluative criteria between crowdsourced workers and journalists.
Accuracy vs. Credibility: The research illuminates a key distinction between perceived accuracy (as assessed by crowdsourced non-experts) and factual accuracy (as determined by journalists). This divergence underscores the necessity of understanding audience perceptions when combating misinformation.

Implications and Future Directions

The paper delivers substantial implications for both practical applications and theoretical considerations in combating misinformation:

Crowdsourced Intelligence: The capability of crowdsourced assessments to effectively distinguish fake news indicates their potential utility as a scalable solution for real-time misinformation. Harnessing the broader engagement of social media users offers a promising avenue for enhancing automated detection systems.
Educational Tools: Understanding how non-experts perceive and judge the credibility of online information could serve educational purposes, potentially informing the development of tools that help users navigate social media more critically.
Policy Design: Policymakers and platform designers might leverage these findings to shape interventions—either technological or educational—that mitigate the spread of misinformation by supporting user-generated accuracy assessments.

To advance this field further, future research could explore hybrid models that integrate both journalistic standards and collective user perceptions, provide empirical evaluation across other social media platforms, or develop interventions based on identified predictive features to preempt the dissemination of disinformation. As computational approaches evolve, the refinement of models and their adaptation to emerging communication modalities will remain essential to maintaining the informational integrity of social networks.