Efficient Influence Related Queries

Download files
Access & Terms of Use
open access
Copyright: Yang, Jianye
Altmetric
Abstract
Recently, there is a surge of interest on mining valuable information from the given datasets. As one of the most important information mining tasks, influence analysis has drawn tremendous attention in both industry and academic communities. Due to the large scale of dataset, there is an emerging call for efficient processing influence related queries. In this thesis, we study three important influence related problems regarding three types of data, i.e., product and user preference data, spatio-textual objects, and set-valued data. Firstly, for product and user preference data, we formulate the problem of influence-based cost optimization on user preference functions, which is critical to unlock the great scientific and social-economic value of these data. By utilizing the classical k-level computation techniques, we show the solution space of our problem can be reduced to a finite number of possible positions (points). To efficient process this problem, we propose a traverse-based 2-dimensional algorithm with linear time complexity. For general multi-dimensional spaces, we develop a space partition based exact algorithm. To accelerate the computation, we further devise a randomized sampling method with high accuracy. Secondly, for spatio-textual objects, we present a novel definition of object influence in applications where objects are of different categories. Under this definition, we investigate the problem of finding the top-k most influential objects. We first show that this problem is NP-hard with respect to the number of object categories. To tackle the computational hardness, we then develop efficient nearest neighbor set based exact as well as approximate algorithms. In particular, our polynomial approximate algorithm has a 2-factor performance guarantee. Finally, for set-valued data, we investigate the problem of set containment join which is an essential and fundamental tool for set-valued data analysis. Based on the computing paradigms, we classify the existing algorithms into two categories, namely intersection-oriented and union-oriented, both of which have their limits. By utilizing the property of skewness in real-life data, we propose a new union-oriented method, namely TT-Join, which not only enhances the advantage of the previous union-oriented methods but also integrates the goodness of intersection-oriented methods by imposing a variant of prefix tree structure.
Persistent link to this record
Link to Publisher Version
Link to Open Access Version
Additional Link
Author(s)
Yang, Jianye
Supervisor(s)
Zhang, Wenjie
Lin, Xuemin
Creator(s)
Editor(s)
Translator(s)
Curator(s)
Designer(s)
Arranger(s)
Composer(s)
Recordist(s)
Conference Proceedings Editor(s)
Other Contributor(s)
Corporate/Industry Contributor(s)
Publication Year
2017
Resource Type
Thesis
Degree Type
PhD Doctorate
UNSW Faculty
Files
download public version.pdf 2.01 MB Adobe Portable Document Format
Related dataset(s)