Measuring and Detecting Virality on Social Media: The Case of Twitter's Viral Tweets Topic (2303.06120v2)

Published 10 Mar 2023 in cs.SI and cs.CY

Abstract: Social media posts may go viral and reach large numbers of people within a short period of time. Such posts may threaten the public dialogue if they contain misleading content, making their early detection highly crucial. Previous works proposed their own metrics to annotate if a tweet is viral or not in order to automatically detect them later. However, such metrics may not accurately represent viral tweets or may introduce too many false positives. In this work, we use the ground truth data provided by Twitter's "Viral Tweets" topic to review the current metrics and also propose our own metric. We find that a tweet is more likely to be classified as viral by Twitter if the ratio of retweets to its author's followers exceeds some threshold. We found this threshold to be 2.16 in our experiments. This rule results in less false positives although it favors smaller accounts. We also propose a transformers-based model to early detect viral tweets which reports an F1 score of 0.79. The code and the tweet ids are publicly available at: https://github.com/tugrulz/ViralTweets

Citations (8)

View on Semantic Scholar

Summary

The paper introduces a novel virality metric using a log-transformed retweet-to-follower ratio that achieves higher detection accuracy (F1 score of 0.79).
It employs a BERT-based transformer model augmented with tweet-level features like media and hashtags to enable early and reliable detection.
The study emphasizes a systematic, data-driven approach to overcome oversimplified metrics, thereby enhancing strategies to combat misinformation on social media.

Introduction

The paper "Measuring and Detecting Virality on Social Media: The Case of Twitter's Viral Tweets Topic" presents a detailed analysis of virality on Twitter. The paper utilizes Twitter's "Viral Tweets" topic, which provides authentic and reliable ground truth data to assess various metrics aimed at identifying viral tweets. A new metric is proposed based on the retweet-to-follower ratio, in conjunction with a transformer-based model for early detection, which offers promising results over previous methodologies.

Figure 1: An example of a tweet listed under Viral Tweets topic page. The tweet was viewed by 851,000 users in 20 hours despite that the account had only 1000 followers.

Metrics of Virality

The paper explores several existing metrics for measuring tweet virality, such as retweet counts exceeding a hard threshold (RT > T) or being normalized by a user's typical tweet retweet counts. However, these often fail to accurately predict virality due to their oversimplification and the inability to account for network impact or potential manipulations by bots.

The research introduces an innovative metric predicated on the log of the retweet-to-follower ratio, which demonstrates superior precision in identifying viral tweets. In extensive experiments, the log-scale metric yields an enhanced harmonic mean of AUC and a higher accuracy in distinguishing truly viral tweets from non-viral ones, proving less lenient and more effective across diverse user account types.

Detecting Viral Content

The paper employs a transformer-based machine learning model to predict viral tweets, focusing exclusively on the content to aid real-time detection for fact-checking purposes. Utilizing BERT-based LLMs, alongside additional tweet-level features such as the presence of media or hashtags, the system efficiently discerns potential viral posts.

Figure 2: The retweet counts of the tweets and the follower counts of the authors in bins with size 1000. We observe that the data is skewed left, towards unpopular users with retweet counts less than 10,000.

The integration of these models with the newly established virality metric significantly boosts prediction accuracy, achieving an F1 score of 0.79, significantly higher than previous models without extra feature integration.

Discussion and Implications

The empirical analysis highlights the importance of using a systematic and data-driven approach to define virality. By leveraging authentic data provided directly by Twitter, the paper circumvents the flaws present in prior methodologies that rely heavily on public metrics.

The combination of mechanical learning models with a refined metric not only offers practical utility in detecting and managing viral content but also sets a foundation for addressing the challenges faced by platforms in combatting misinformation and its rapid dissemination.

Conclusion

The paper comprehensively evaluates the existing and new metrics for tweet virality, revealing substantial improvements in detection accuracy using refined approaches. This work furnishes a robust framework for researchers and practitioners aiming to tackle virality and its associated consequences in online social networks. Future research may benefit from these insights, applying them across varying platforms that lack inherent data transparency, and extending them to analyze media-rich content, which is shown to have a significant impact on virality.