Predicting the popularity of tweets using the theory of point processes

dc.contributor.advisor Chen, Feng en_US Tan, Wai Hong en_US 2022-03-23T11:18:59Z 2022-03-23T11:18:59Z 2019 en_US
dc.description.abstract This thesis focuses on the problem of predicting the tweet popularity, or the number of retweets stemming from an original tweet. We propose several prediction methodologies using the theory of point processes, where the prediction of the future popularity of a tweet is based on observing the retweet time sequence up to a certain censoring time, and the prediction performance is evaluated on a large Twitter data set. We first propose a marked point process model, termed the Marked Self-Exciting Process with Time-Dependent Excitation Function, or the MaSEPTiDE for short. The intensity process of the model is interpretable as a cluster Poisson process, which implies that the model can be simulated using the cascading algorithm similar to that used for the efficient simulation of Hawkes processes, and the prediction can be done properly by exploiting the probabilistic properties of the model. The MaSEPTiDE approach shows highly accurate tweet popularity predictions compared to state-of-the-art approaches, especially at shorter censoring times. We further propose an inhomogeneous Poisson process model and an estimation method which utilizes internal and external knowledge, based on the times of historical retweets up to the censoring time, and the complete retweet sequences in the training data set respectively. The knowledge is combined using a novel empirical Bayes type approach, where the prior distribution for the model parameter is constructed based on the external knowledge, and the likelihood is calculated based on the internal knowledge. The mode of the posterior distribution is used as the estimator of the finite-dimensional parameter, and suitable functionals of the predictive distribution for the number of retweets implied by the estimated model are used to predict the tweet popularity. The model, termed the EB Poisson model, is found to be both efficient and accurate, with an additional advantage of being able to predict without observing any retweets. The proposed EB approach of inference is applicable on other point process models, such as the MaSEPTiDE model, to improve the prediction performance and computational efficiency. We demonstrate this by applying the EB approach on the MaSEPTiDE model and reporting further improvements in the prediction accuracy. en_US
dc.language English
dc.language.iso EN en_US
dc.publisher UNSW, Sydney en_US
dc.rights CC BY-NC-ND 3.0 en_US
dc.rights.uri en_US
dc.subject.other Prediction methodologies en_US
dc.subject.other Tweet popularity en_US
dc.subject.other Point processes theory en_US
dc.title Predicting the popularity of tweets using the theory of point processes en_US
dc.type Thesis en_US
dcterms.accessRights open access
dcterms.rightsHolder Tan, Wai Hong
dspace.entity.type Publication en_US
unsw.relation.faculty Science
unsw.relation.originalPublicationAffiliation Tan, Wai Hong, Mathematics & Statistics, Faculty of Science, UNSW en_US
unsw.relation.originalPublicationAffiliation Chen, Feng, Mathematics & Statistics, Faculty of Science, UNSW en_US School of Mathematics & Statistics *
unsw.thesis.degreetype PhD Doctorate en_US
Original bundle
Now showing 1 - 1 of 1
No Thumbnail Available
public version.pdf
4.33 MB
Resource type