Efficient Influence Related Queries

Yang, Jianye

doi:10.26190/unsworks/19847

Efficient Influence Related Queries

Download files

Access & Terms of Use

open access
Copyright: Yang, Jianye

CC BY-NC-ND 3.0

Abstract

Recently, there is a surge of interest on mining valuable information from the given datasets. As one of the most important information mining tasks, influence analysis has drawn tremendous attention in both industry and academic communities. Due to the large scale of dataset, there is an emerging call for efficient processing influence related queries. In this thesis, we study three important influence related problems regarding three types of data, i.e., product and user preference data, spatio-textual objects, and set-valued data. Firstly, for product and user preference data, we formulate the problem of influence-based cost optimization on user preference functions, which is critical to unlock the great scientific and social-economic value of these data. By utilizing the classical k-level computation techniques, we show the solution space of our problem can be reduced to a finite number of possible positions (points). To efficient process this problem, we propose a traverse-based 2-dimensional algorithm with linear time complexity. For general multi-dimensional spaces, we develop a space partition based exact algorithm. To accelerate the computation, we further devise a randomized sampling method with high accuracy. Secondly, for spatio-textual objects, we present a novel definition of object influence in applications where objects are of different categories. Under this definition, we investigate the problem of finding the top-k most influential objects. We first show that this problem is NP-hard with respect to the number of object categories. To tackle the computational hardness, we then develop efficient nearest neighbor set based exact as well as approximate algorithms. In particular, our polynomial approximate algorithm has a 2-factor performance guarantee. Finally, for set-valued data, we investigate the problem of set containment join which is an essential and fundamental tool for set-valued data analysis. Based on the computing paradigms, we classify the existing algorithms into two categories, namely intersection-oriented and union-oriented, both of which have their limits. By utilizing the property of skewness in real-life data, we propose a new union-oriented method, namely TT-Join, which not only enhances the advantage of the previous union-oriented methods but also integrates the goodness of intersection-oriented methods by imposing a variant of prefix tree structure.

Persistent link to this record

http://hdl.handle.net/1959.4/58501

DOI

https://doi.org/10.26190/unsworks/19847

Author(s)

Yang, Jianye

Supervisor(s)

Zhang, Wenjie

Lin, Xuemin

Publication Year

2017

Resource Type

Thesis

Degree Type

PhD Doctorate

UNSW Faculty

Files

public version.pdf

2.01 MB

Adobe Portable Document Format

View full record Show statistics

Library

Efficient Influence Related Queries

Access & Terms of Use

Altmetric

Abstract

Persistent link to this record

DOI

Link to Publisher Version

Link to Open Access Version

Additional Link

Author(s)

Supervisor(s)

Creator(s)

Editor(s)

Translator(s)

Curator(s)

Designer(s)

Arranger(s)

Composer(s)

Recordist(s)

Conference Proceedings Editor(s)

Other Contributor(s)

Corporate/Industry Contributor(s)

Publication Year

Resource Type

Degree Type

UNSW Faculty

Files

Related dataset(s)