Predicting the popularity of tweets using the theory of point processes

Download files
Access & Terms of Use
open access
Copyright: Tan, Wai Hong
Abstract
This thesis focuses on the problem of predicting the tweet popularity, or the number of retweets stemming from an original tweet. We propose several prediction methodologies using the theory of point processes, where the prediction of the future popularity of a tweet is based on observing the retweet time sequence up to a certain censoring time, and the prediction performance is evaluated on a large Twitter data set. We first propose a marked point process model, termed the Marked Self-Exciting Process with Time-Dependent Excitation Function, or the MaSEPTiDE for short. The intensity process of the model is interpretable as a cluster Poisson process, which implies that the model can be simulated using the cascading algorithm similar to that used for the efficient simulation of Hawkes processes, and the prediction can be done properly by exploiting the probabilistic properties of the model. The MaSEPTiDE approach shows highly accurate tweet popularity predictions compared to state-of-the-art approaches, especially at shorter censoring times. We further propose an inhomogeneous Poisson process model and an estimation method which utilizes internal and external knowledge, based on the times of historical retweets up to the censoring time, and the complete retweet sequences in the training data set respectively. The knowledge is combined using a novel empirical Bayes type approach, where the prior distribution for the model parameter is constructed based on the external knowledge, and the likelihood is calculated based on the internal knowledge. The mode of the posterior distribution is used as the estimator of the finite-dimensional parameter, and suitable functionals of the predictive distribution for the number of retweets implied by the estimated model are used to predict the tweet popularity. The model, termed the EB Poisson model, is found to be both efficient and accurate, with an additional advantage of being able to predict without observing any retweets. The proposed EB approach of inference is applicable on other point process models, such as the MaSEPTiDE model, to improve the prediction performance and computational efficiency. We demonstrate this by applying the EB approach on the MaSEPTiDE model and reporting further improvements in the prediction accuracy.
Persistent link to this record
Link to Publisher Version
Additional Link
Author(s)
Tan, Wai Hong
Supervisor(s)
Chen, Feng
Creator(s)
Editor(s)
Translator(s)
Curator(s)
Designer(s)
Arranger(s)
Composer(s)
Recordist(s)
Conference Proceedings Editor(s)
Other Contributor(s)
Corporate/Industry Contributor(s)
Publication Year
2019
Resource Type
Thesis
Degree Type
PhD Doctorate
UNSW Faculty
Files
download public version.pdf 4.33 MB Adobe Portable Document Format
Related dataset(s)