Cohesive subgraph computation over large heterogeneous information networks

Download files
Access & Terms of Use
open access
Embargoed until 2023-11-05
Copyright: Yang, Yixing
Heterogeneous information networks (HINs) are networks involving multiple typed objects and multiple typed links denoting different relations, and they are prevalent in various domains, such as bibliographic networks, social media networks, and knowledge networks. Recently, the topic of cohesive subgraphs discovery has gained plenty of attention. Typical cohesive subgraph models include k-core, k-truss, k-ECC, maximal-clique, quasi-clique, etc. Among the cohesive subgraph models, the k-core and the k-truss have been demonstrated to be outstanding, as they can achieve both high cohesiveness and high computational efficiency. Conceptually, the k-core of a graph is a subgraph in which each vertex participates in at least k-1 edges, while the k-truss of a graph is a subgraph in which each edge participates in at least k-2 triangles. Nevertheless, existing solutions mainly focus on homogeneous networks, where vertices are of the same type, and thus cannot be applied to HINs. In this thesis, we study the problem of cohesive subgraphs mining and analysis over large HINs. Firstly, we study the problem of k-core based community search over large HINs; that is, given a query vertex q, find a community from an HIN containing q, in which all the vertices are with the same type of q and have close relationships. To model the relationship between two vertices of the same type, we adopt the well-known concept of meta-path, which is a sequence of relations defined between different types of vertices. We then measure the cohesiveness of the community by extending the classic minimum degree metric with a meta-path. We further propose efficient query algorithms for finding communities using these cohesiveness metrics. Secondly, we study the problem of truss based cohesive subgraph computation over large HINs. As the classic truss model is based on the structure of triangle which does not exist in some HINs, we introduce two kinds of HIN tirangles for three vertices, regarding a specific meta-path P. The first one requires that each pair of vertices is connected by an instance of P, while the second one also has such a connectivity constraint but further needs that the three instances of P form a circle. Based on these two kinds of triangles, we propose two HIN truss models respectively. We further develop efficient truss computation algorithms. Finally, we study the problem of bi-triangle counting over a specific type of networks--a bipartite network, where a bi-triangle is a cycle with three vertices from one vertex set and three vertices from another vertex set. Counting bi-triangles has found many real applications such as computing the transitivity coefficient and clustering coefficient for bipartite networks. To enable efficient bi-triangle counting, we first develop a baseline algorithm relying on the observation that each bi-triangle can be considered as the join of three wedges. Then, we propose a more sophisticated algorithm which regards a bi-triangle as the join of two super-wedges, where a wedge is a path with two edges while a super-wedge is a path with three edges. We further optimize the algorithm by ranking vertices according to their degrees.
Persistent link to this record
Link to Publisher Version
Link to Open Access Version
Additional Link
Yang, Yixing
Conference Proceedings Editor(s)
Other Contributor(s)
Corporate/Industry Contributor(s)
Publication Year
Resource Type
Degree Type
PhD Doctorate
UNSW Faculty
download public version.pdf 2.63 MB Adobe Portable Document Format
Related dataset(s)