Efficient algorithms for graph substructure search

Download files
Access & Terms of Use
open access
Copyright: Shang, Haichuan
Altmetric
Abstract
Many recent applications strongly demand efficient and effective management of graph structured data such as paths, trees, and general graphs. These applications include Bioinformatics, Chemistry, Social networks, Software and Data Engineering, World Wide Web, etc. The goal of this thesis is to investigate graph substructure search problem. Its focus is placed on enhancing efficiency in subgraph containment search and similarity search. Three subproblems are studied: (1) subgraph containment query; (2) substructure similarity search; and (3) similarity search on supergraph containment. For the subgraph containment query problem, efficient techniques are developed to retrieve graphs, containing a given query graph, from a large set of graphs. Most of the existing techniques based on the framework of filtering-and-verification to reduce the precise computation costs. While the existing techniques worked well for small query graphs, the verification phase became a bottleneck when the query graph size increased. Motivated by this, a novel and efficient algorithm is firstly proposed in this thesis for testing subgraph isomorphism. Secondly, a new feature-based index technique is developed to accommodate the above algorithm in the filtering phase. Substructure similarity search is to retrieve graphs that approximately contain a given query graph. It has many applications, e.g., detecting similar functions among chemical compounds. This thesis proposes a novel indexing technique, GrafD-Index, to index graphs according to their “distances” to features. Then lower and upper bounding techniques are developed to exploit the GrafD-Index to (1) prune non-promising graphs and (2) include graphs whose similarities are guaranteed to exceed the given similarity threshold. Considering that the verification phase is not well studied and plays the dominant role in the whole process, efficient algorithms are devised to verify candidates. This thesis also investigates the problem of efficiently retrieving all data graphs approximately contained by a query graph, namely similarity search on supergraph containment. A novel and efficient index is proposed to boost the efficiency of query processing. The index supports two construction strategies which aim at optimizing the performance for data distributions. Moreover, an algorithm is developed by effectively merging the indexes of individual data graphs; this not only reduces the index size but also further reduces the query processing time.
Persistent link to this record
Link to Publisher Version
Link to Open Access Version
Additional Link
Author(s)
Shang, Haichuan
Supervisor(s)
Lin, Xuemin
Creator(s)
Editor(s)
Translator(s)
Curator(s)
Designer(s)
Arranger(s)
Composer(s)
Recordist(s)
Conference Proceedings Editor(s)
Other Contributor(s)
Corporate/Industry Contributor(s)
Publication Year
2011
Resource Type
Thesis
Degree Type
PhD Doctorate
UNSW Faculty
Files
download whole.pdf 738.58 KB Adobe Portable Document Format
Related dataset(s)