Top-k similarity join over multi-valued objects

Download files
Access & Terms of Use
open access
Copyright: Xu, Jing
Altmetric
Abstract
The join query is a fundamental tool in many modern application areas including location-based services, geographic information system (GIS), finance and capital markets analysis, etc. Given two sets of objects U and V, a top-k similarity join returns k pairs of most similar objects from U x V. The top-k similarity joins have been extensively studied and used in a wide spectrum of applications such as information retrieval, decision making, spatial data analysis and data mining. In the conventional model of top-k similarity join processing, an object is usually regarded as a point in a multi-dimensional space and the similarity between two objects is usually measured by distance metrics such as Euclidean distance. However, in many applications such as decision making and e-business, an object may be described by multiple values (instances) and the conventional model is not applicable since it does not address the distributions of object instances. In this thesis, we study top-k similarity join queries over multi-valued objects. We formalize the problem of top-k similarity join over multi-valued objects, regarding quantile-based distance metrics which is applied to explore the relative instance distribution among the multiple instances of objects. Efficient and effective techniques to process top-k similarity joins over multi-valued objects are developed following a filtering-refinement framework. Novel distance, statistic and weight based pruning techniques are proposed. Comprehensive experiments on both real and synthetic datasets demonstrate the efficiency and effectiveness of our techniques.
Persistent link to this record
Link to Publisher Version
Link to Open Access Version
Additional Link
Author(s)
Xu, Jing
Supervisor(s)
Zhang, Wenjie
Lin, Xuemin
Zhang, Ying
Creator(s)
Editor(s)
Translator(s)
Curator(s)
Designer(s)
Arranger(s)
Composer(s)
Recordist(s)
Conference Proceedings Editor(s)
Other Contributor(s)
Corporate/Industry Contributor(s)
Publication Year
2012
Resource Type
Thesis
Degree Type
Masters Thesis
UNSW Faculty
Files
download whole.pdf 2.42 MB Adobe Portable Document Format
Related dataset(s)