Efficient publish/subscribe processing over geo-textual stream

Download files
Access & Terms of Use
open access
Copyright: Wang, Xiang
Altmetric
Abstract
With the prevalence of social media and GPS-enabled devices, a massive amount of geo-textual data has been continuously generated in a stream fashion. In this thesis, we study the problem of efficiently processing streaming geo-textual data over publish/subscribe systems (pub/sub for short), which has broad applications in location-based advertising and information dissemination. In a spatial-keyword pub/sub system, users can register their interest as spatial-keyword subscriptions (e.g., interest in nearby restaurant discount); a stream of geo-textual messages (e.g., geo-tagged e-coupons) released by publishers will be delivered to the relevant subscriptions continuously. We comprehensively study three important aspects regarding spatial-keyword pub/sub systems as follows. Firstly, we investigate boolean-based spatial-keyword pub/sub, where a message is delivered to a subscription if it contains all the subscription keywords and falls inside the subscription range. We tackle both stationary subscriptions and moving subscriptions by proposing a novel adaptive indexing structure, which significantly reduces the processing time of incoming messages. Secondly, we study ranking-based spatial-keyword pub/sub, where we continuously maintain top-k most relevant messages for all the subscriptions over a sliding window. A novel index which seamlessly integrates both spatial-based and keyword-based pruning rules is proposed to support efficient message dissemination. A cost-based re-evaluation technique is further developed to reduce the number of re-evaluations. This is the first work to investigate spatial-keyword pub/sub over sliding window. Finally, we investigate distributed stream processing, where we process a continuous data stream in a distributed manner. We first study distributed stream similarity join over textual data. We develop a novel length-based distribution framework to dispatch incoming data by the number of tokens inside, which incurs no data replication, small communication cost and high throughput. We also design a bundle-based local index to facilitate the local join by grouping similar objects. We then consider geo-textual data by extending ranking-based spatial-keyword pub/sub into a distributed environment. Efficient distribution mechanisms are developed to achieve load balance and high throughput. This is the first work that systematically studies ranking-based spatial-keyword pub/sub in a distributed stream environment.
Persistent link to this record
Link to Publisher Version
Link to Open Access Version
Additional Link
Author(s)
Wang, Xiang
Supervisor(s)
Lin, Xuemin
Zhang, Wenjie
Zhang, Ying
Creator(s)
Editor(s)
Translator(s)
Curator(s)
Designer(s)
Arranger(s)
Composer(s)
Recordist(s)
Conference Proceedings Editor(s)
Other Contributor(s)
Corporate/Industry Contributor(s)
Publication Year
2017
Resource Type
Thesis
Degree Type
PhD Doctorate
UNSW Faculty
Files
download public copy.pdf 4.27 MB Adobe Portable Document Format
Related dataset(s)