Abstract
Recent advances in database research have shown the potential of graph in modelling
complicated data. Graph data have pervaded many modern applications
including bio-informatics, chem-informatics, semantic webs, software engineering,
etc. Efficient algorithms are strongly demanded in these applications to manage
and analyze graph data.
Thesis studies three fundamental problems in managing and analyzing graph
data by proposing efficient query processing algorithms. These problems include
(1) similarity all-matching (2) supergraph containment search and (3) subgraph
containment search.
We first study the problem of similarity all-matching to retrieve all similarity
matches of a given query graph in a data graph with the number of missing
edges bounded by a given similarity threshold. The only existing work proposes an
enumerate-search paradigm. Our first approach to similarity all-matching proposes
a novel query decomposition based hierarchical framework to effectively reduce the
number of intermediate matches and share the intermediate matches produced in
the enumerate-search paradigm. Our second approach to similarity all-matching
proposes a novel tree based spanning search paradigm to effectively reduce the
number of enumerated seeds and the corresponding searching processes.
Our second problem, supergraph containment search, is to retrieve all the data
graphs contained by a given query graph from a graph database. Previous works
follow the filtering-verification framework to first filter most negative answers and
then conduct verification on a small number of surviving candidates. Observing
that the framework is significantly challenged by the redundant computation between
the filtering phase and the verification phase, this thesis proposes a novel
computation-sharing framework to maximize the computation sharing between the
two phases.
Our last problem, subgraph containment search, is to retrieve all data graphs
containing a given query graph from a graph database. The existing indexing
techniques not only fail to address the redundant computation in the filtering phase
but also are inefficient to be directly applied on some important variant problems of
subgraph containment search. Motivated by these, this thesis proposes an efficient
query processing algorithms to reduce the filtering cost in subgraph containment
search and then develops efficient query processing algorithms for two important
variations of subgraph containment search.