Efficient algorithms for graph substructure search

Shang, Haichuan

doi:10.26190/unsworks/23811

Efficient algorithms for graph substructure search

Download files

Access & Terms of Use

open access
Copyright: Shang, Haichuan

CC BY-NC-ND 3.0

Abstract

Many recent applications strongly demand efficient and effective management of graph structured data such as paths, trees, and general graphs. These applications include Bioinformatics, Chemistry, Social networks, Software and Data Engineering, World Wide Web, etc. The goal of this thesis is to investigate graph substructure search problem. Its focus is placed on enhancing efficiency in subgraph containment search and similarity search. Three subproblems are studied: (1) subgraph containment query; (2) substructure similarity search; and (3) similarity search on supergraph containment. For the subgraph containment query problem, efficient techniques are developed to retrieve graphs, containing a given query graph, from a large set of graphs. Most of the existing techniques based on the framework of filtering-and-verification to reduce the precise computation costs. While the existing techniques worked well for small query graphs, the verification phase became a bottleneck when the query graph size increased. Motivated by this, a novel and efficient algorithm is firstly proposed in this thesis for testing subgraph isomorphism. Secondly, a new feature-based index technique is developed to accommodate the above algorithm in the filtering phase. Substructure similarity search is to retrieve graphs that approximately contain a given query graph. It has many applications, e.g., detecting similar functions among chemical compounds. This thesis proposes a novel indexing technique, GrafD-Index, to index graphs according to their “distances” to features. Then lower and upper bounding techniques are developed to exploit the GrafD-Index to (1) prune non-promising graphs and (2) include graphs whose similarities are guaranteed to exceed the given similarity threshold. Considering that the verification phase is not well studied and plays the dominant role in the whole process, efficient algorithms are devised to verify candidates. This thesis also investigates the problem of efficiently retrieving all data graphs approximately contained by a query graph, namely similarity search on supergraph containment. A novel and efficient index is proposed to boost the efficiency of query processing. The index supports two construction strategies which aim at optimizing the performance for data distributions. Moreover, an algorithm is developed by effectively merging the indexes of individual data graphs; this not only reduces the index size but also further reduces the query processing time.

Persistent link to this record

http://hdl.handle.net/1959.4/51214

DOI

https://doi.org/10.26190/unsworks/23811

Author(s)

Shang, Haichuan

Supervisor(s)

Lin, Xuemin

Publication Year

2011

Resource Type

Thesis

Degree Type

PhD Doctorate

UNSW Faculty

Files

whole.pdf

738.58 KB

Adobe Portable Document Format

View full record Show statistics

Library

Efficient algorithms for graph substructure search

Access & Terms of Use

Altmetric

Abstract

Persistent link to this record

DOI

Link to Publisher Version

Link to Open Access Version

Additional Link

Author(s)

Supervisor(s)

Creator(s)

Editor(s)

Translator(s)

Curator(s)

Designer(s)

Arranger(s)

Composer(s)

Recordist(s)

Conference Proceedings Editor(s)

Other Contributor(s)

Corporate/Industry Contributor(s)

Publication Year

Resource Type

Degree Type

UNSW Faculty

Files

Related dataset(s)