I/O efficient cohesive subgraph search in large graphs

Download files
Access & Terms of Use
open access
Copyright: Yuan, Long
Altmetric
Abstract
Cohesive subgraph search has been recently studied for its large number of applications in social networks, web search, collaboration networks, and biology. Most existing approaches for cohesive subgraph search assume that the graph is kept in memory of a machine. However, many real-world graphs are big and may not reside in memory. Consequently, the I/Os between fast internal memory and slower disks can be a major performance bottleneck for these algorithms when the graph is too large to fit completely in memory. Therefore, there is an emerging call for I/O efficient algorithms for cohesive subgraph search. In this paper, we study three important problems in cohesive subgraph search with respect to the I/O efficiency: diversified top-k clique search, edge connected component decomposition and Steiner component search. Firstly, we investigate the I/O efficient algorithm for diversified top-k clique search problem. We devise an algorithm to maintain k candidates in the process of maximal clique enumeration. Our algorithm has limited memory footprint and can achieve a guaranteed approximation ratio. We also introduce a novel light-weight PNP-Index, based on which we design an optimal maximal clique maintenance algorithm. We further explore three optimization strategies to avoid enumerating all maximal cliques and thus largely reduce the computational cost. For the massive input graph, we also develop an I/O efficient algorithm to tackle the problem when the input graph cannot fit in main memory. Secondly, we study the problem of edge connected component decomposition. We introduce two elegant graph reduction operators and propose three novel I/O efficient algorithms, Bottom-Up, Top-Down, and Hybrid, that explore the k values in different orders to perform the edge connected component decomposition. Finally, we investigate the problem of computing the Steiner component with the maximum connectivity, which aims to find the maximum induced subgraph that has the maximum connectivity for a given set of query nodes. To efficiently address the problem when the input graph cannot be loaded in memory, we propose an index, SMCC-Index, whose memory size can be bounded by O(n), where n is the number of nodes of the input graph. With the help of SMCC-Index, we can finish the Steiner component search with optimal I/Os. We also propose two I/O efficient algorithms to build SMCC-Index.
Persistent link to this record
Link to Publisher Version
Link to Open Access Version
Additional Link
Author(s)
Yuan, Long
Supervisor(s)
Lin, Xuemin
Qin, Lu
Creator(s)
Editor(s)
Translator(s)
Curator(s)
Designer(s)
Arranger(s)
Composer(s)
Recordist(s)
Conference Proceedings Editor(s)
Other Contributor(s)
Corporate/Industry Contributor(s)
Publication Year
2017
Resource Type
Thesis
Degree Type
PhD Doctorate
UNSW Faculty
Files
download public version.pdf 4.95 MB Adobe Portable Document Format
Related dataset(s)