Distributed Subgraph Enumeration

Lai, Longbin

doi:10.26190/unsworks/19738

Publication:

Distributed Subgraph Enumeration

dc.contributor.advisor	Lin, Xuemin	en_US
dc.contributor.advisor	Qin, Lu	en_US
dc.contributor.advisor	Zhang, Ying	en_US
dc.contributor.author	Lai, Longbin	en_US
dc.date.accessioned	2022-03-22T15:09:48Z
dc.date.available	2022-03-22T15:09:48Z
dc.date.issued	2017	en_US
dc.description.abstract	Subgraph enumeration is a fundamental graph problem with many applications. However, existing algorithms for subgraph enumeration fall short in handling large graphs due to the computational hardness. In this work, we propose a general approach that solves subgraph enumeration in the distributed contexts, including MapReduce and Spark. The approach features a decomposition-and-join manner, in which the pattern graph is decomposed into a set of structures, called join unit. We introduce the distributed graph storage mechanism to determine what structure can be the join unit. Consequently, we obtain the results by joining the matches of all join units following a specific join structure. Based on the general approach, we first propose a star-based join framework. In the framework, we adopt a basic graph storage mechanism that only supports a star (a tree with depth 2) as the join unit, and we apply the left-deep join structure to process the join. We then show that a special star called TwinTwig - an edge or two incident edges of a node - is enough to guarantee instance optimality in the star-based join framework under reasonable assumptions, which inspires the TwinTwigJoin algorithm. We devise an A*-based algorithm to compute the optimal join plan for TwinTwigJoin. TwinTwigJoin is still not scalable to large graph because of the constraints in the left-deep join structure and that each join unit must be a star. We then explore the graph-based join framework that allows us to use more than just star as the join unit. In addition, we use the bushy join structure rather than left-deep join to guarantee the optimality of the join plan. Aware that it is storage-inefficient to use any structure as the join unit, we develop the SEED algorithm that implements an effective distributed graph storage mechanism to use star and clique (complete graph) as the join units. We then devise a dynamic-programming algorithm to compute an optimal bushy join plan. SEED frees us from the constraints in TwinTwigJoin, and greatly improves the performance of subgraph enumeration. Ultimately, we develop two data-compression techniques, namely compressed graph and clique compression, to further reduce the enormous cost while transferring and maintaining the (intermediate) results.	en_US
dc.identifier.uri	http://hdl.handle.net/1959.4/58084
dc.language	English
dc.language.iso	EN	en_US
dc.publisher	UNSW, Sydney	en_US
dc.rights	CC BY-NC-ND 3.0	en_US
dc.rights.uri	https://creativecommons.org/licenses/by-nc-nd/3.0/au/	en_US
dc.subject.other	MapReduce	en_US
dc.subject.other	Subgraph Enumeration	en_US
dc.subject.other	Big Data	en_US
dc.subject.other	Join Optimization	en_US
dc.subject.other	Graph Database	en_US
dc.title	Distributed Subgraph Enumeration	en_US
dc.type	Thesis	en_US
dcterms.accessRights	open access
dcterms.rightsHolder	Lai, Longbin
dspace.entity.type	Publication	en_US
unsw.accessRights.uri	https://purl.org/coar/access_right/c_abf2
unsw.identifier.doi	https://doi.org/10.26190/unsworks/19738
unsw.relation.faculty	Engineering
unsw.relation.originalPublicationAffiliation	Lai, Longbin, Computer Science & Engineering, Faculty of Engineering, UNSW	en_US
unsw.relation.originalPublicationAffiliation	Lin, Xuemin, Computer Science & Engineering, Faculty of Engineering, UNSW	en_US
unsw.relation.originalPublicationAffiliation	Qin, Lu, QCIS, UTS	en_US
unsw.relation.originalPublicationAffiliation	Zhang, Ying, QCIS, UTS	en_US
unsw.relation.school	School of Computer Science and Engineering	*
unsw.thesis.degreetype	PhD Doctorate	en_US

Files

Original bundle

Now showing 1 - 1 of 1

Name:: public-version-longbin-thesis.pdf
Size:: 3.08 MB
Format:: application/pdf
Description:

Download

Resource type

Thesis

Publication: Distributed Subgraph Enumeration

Files

Original bundle

Resource type

Publication:

Distributed Subgraph Enumeration