Efficient processing of XML messages

Download files
Access & Terms of Use
open access
Copyright: Choi, Ryan Hyun
Altmetric
Abstract
Over the past few years, there have been an increasing number of online-based applications such as news and stock-monitoring systems that exchange and store various types of data on the Internet. However, problems arise when these applications integrate semi-structured and heterogeneous data generated independently by Web service applications. To address this issue, XML was proposed, and it has now become the standard data exchange and storage format on the Internet. In this thesis, we analyse several problems that arise when building an XML publish/subscribe system. First, we look at the problem of expressing complex user queries for the system. While XQuery has become the standard for querying XML data, the complexity of XQuery has made itself not as successful as expected. To address this problem, we propose a visual XQuery specification language. By intuitive abstractions of XML and XQuery, our technique can generate XQuery queries for users that have little knowledge about the language. Second, we look at the problem of processing streaming XML data efficiently against a large number of branch XPath queries. To address query performance issues, we propose a technique that evaluates groups of similar branch queries simultaneously. Moreover, while join operations are being performed, our technique shares intermediate join results as much as possible amongst the queries in the same group. Furthermore, we also propose a technique to evaluate queries that contain multiple inter-document, value-based join operations. By reducing the overall number of join operations, experiments show that query performance is improved significantly. Third, we propose an X~IIL keyword search framework and algorithm that enable users to store and search useful messages received from multiple data sources. Our framework is small in size, and runs existing keyword search algorithms faster. In addition, we propose a labelling scheme, which compactly represents XML data, and supports all necessary operations required by keyword search algorithms efficiently. Lastly, we present compressed inverted lists based on our labelling scheme that runs search operations even faster, and supports updates. Extensive experiments show the effectiveness of our technique.
Persistent link to this record
Link to Publisher Version
Link to Open Access Version
Additional Link
Author(s)
Choi, Ryan Hyun
Supervisor(s)
Creator(s)
Editor(s)
Translator(s)
Curator(s)
Designer(s)
Arranger(s)
Composer(s)
Recordist(s)
Conference Proceedings Editor(s)
Other Contributor(s)
Corporate/Industry Contributor(s)
Publication Year
2010
Resource Type
Thesis
Degree Type
PhD Doctorate
UNSW Faculty
Files
download Choi-014954990.pdf 65.12 MB Adobe Portable Document Format
Related dataset(s)