A Comparison of Retweet Prediction Approaches: The Superiority of Random Forest Learning Method

Abstract: We consider the following retweet prediction task: given a tweet, predict whether it will be retweeted. In the past, a wide range of learning methods and features has been proposed for this task. We provide a systematic comparison of the performance of these learning methods and features in terms of prediction accuracy and feature importance. Specifically, from each previously published approach we take the best performing features and group these into two sets: user features and tweet features. In addition, we contrast five learning methods, both linear and non-linear. On top of that, we examine the added valueof a previously proposed time-sensitive modeling approach. To the authors’ knowledge this is the first attempt to collect best performing features and contrast linear and non-linear learning methods. We perform our comparisons on a single dataset and find that user features such as the number of times a user is listed, number of followers, and average number of tweets published per day most strongly contribute to prediction accuracy across selected learning methods. We also find that a random forestbased learning, which has not been employed in previous studies, achieves the highest performance among the learning methods we consider. We also find that on top of properly tuned learning methods the benefits of time-sensitive modeling are very limited.
Keywords: retweet prediction, machine learning algorithms, performance
Author: Hendra Bunyamin, Tomas Tunys
Journal Code: jptkomputergg160259

Artikel Terkait :