Distributed Subgraph Enumeration

dc.contributor.advisor Lin, Xuemin en_US
dc.contributor.advisor Qin, Lu en_US
dc.contributor.advisor Zhang, Ying en_US Lai, Longbin en_US 2022-03-22T15:09:48Z 2022-03-22T15:09:48Z 2017 en_US
dc.description.abstract Subgraph enumeration is a fundamental graph problem with many applications. However, existing algorithms for subgraph enumeration fall short in handling large graphs due to the computational hardness. In this work, we propose a general approach that solves subgraph enumeration in the distributed contexts, including MapReduce and Spark. The approach features a decomposition-and-join manner, in which the pattern graph is decomposed into a set of structures, called join unit. We introduce the distributed graph storage mechanism to determine what structure can be the join unit. Consequently, we obtain the results by joining the matches of all join units following a specific join structure. Based on the general approach, we first propose a star-based join framework. In the framework, we adopt a basic graph storage mechanism that only supports a star (a tree with depth 2) as the join unit, and we apply the left-deep join structure to process the join. We then show that a special star called TwinTwig - an edge or two incident edges of a node - is enough to guarantee instance optimality in the star-based join framework under reasonable assumptions, which inspires the TwinTwigJoin algorithm. We devise an A*-based algorithm to compute the optimal join plan for TwinTwigJoin. TwinTwigJoin is still not scalable to large graph because of the constraints in the left-deep join structure and that each join unit must be a star. We then explore the graph-based join framework that allows us to use more than just star as the join unit. In addition, we use the bushy join structure rather than left-deep join to guarantee the optimality of the join plan. Aware that it is storage-inefficient to use any structure as the join unit, we develop the SEED algorithm that implements an effective distributed graph storage mechanism to use star and clique (complete graph) as the join units. We then devise a dynamic-programming algorithm to compute an optimal bushy join plan. SEED frees us from the constraints in TwinTwigJoin, and greatly improves the performance of subgraph enumeration. Ultimately, we develop two data-compression techniques, namely compressed graph and clique compression, to further reduce the enormous cost while transferring and maintaining the (intermediate) results. en_US
dc.language English
dc.language.iso EN en_US
dc.publisher UNSW, Sydney en_US
dc.rights CC BY-NC-ND 3.0 en_US
dc.rights.uri en_US
dc.subject.other MapReduce en_US
dc.subject.other Subgraph Enumeration en_US
dc.subject.other Big Data en_US
dc.subject.other Join Optimization en_US
dc.subject.other Graph Database en_US
dc.title Distributed Subgraph Enumeration en_US
dc.type Thesis en_US
dcterms.accessRights open access
dcterms.rightsHolder Lai, Longbin
dspace.entity.type Publication en_US
unsw.relation.faculty Engineering
unsw.relation.originalPublicationAffiliation Lai, Longbin, Computer Science & Engineering, Faculty of Engineering, UNSW en_US
unsw.relation.originalPublicationAffiliation Lin, Xuemin, Computer Science & Engineering, Faculty of Engineering, UNSW en_US
unsw.relation.originalPublicationAffiliation Qin, Lu, QCIS, UTS en_US
unsw.relation.originalPublicationAffiliation Zhang, Ying, QCIS, UTS en_US School of Computer Science and Engineering *
unsw.thesis.degreetype PhD Doctorate en_US
Original bundle
Now showing 1 - 1 of 1
No Thumbnail Available
3.08 MB
Resource type