Deep Graph Representation Learning and its Application on Graph Clustering

Conference: Bournemouth University, Faculty of Science and Technology

Abstract:

Graphs like social networks, molecular graphs, and traffic networks are everywhere in the real world. Deep Graph Representation Learning (DGL) is essential for most graph applications, such as Graph Classification, Link Prediction, and Community Detection. DGL has made significant progress in recent years because of the development of Graph Neural Networks (GNNs). However, there are still several crucial challenges that the field faces, including in (semi-)supervised DGL, self-supervised DGL, and DGL-based graph clustering. In this thesis, I proposed three models to address the problems in these three aspects respectively.

GNNs have been widely used in DGL problems. However, GNNs suffer from over- smoothing due to their repeated local aggregation and over-squashing due to the exponential growth in computation paths with increased model depth, which confines their expres- sive power. To solve this problem, a Hierarchical Structure Graph Transformer called HighFormer is proposed to leverage local and relatively global structure information. I use GNNs to learn the initial graph node representation based on the local structure in- formation. At the same time, a structural attention module is used to learn the relatively global structural similarity. Then, the improved attention matrix was obtained by adding the relatively global structure similarity matrix to the traditional attention matrix. Finally, the graph representation was learned by the improved attention matrix.

Graph contrastive learning (GCL) has recently become the most powerful method in self-supervised graph representation learning (SGL), of which graph augmentation is a critical component to generating different views of input graphs. Most existing GCL methods perform stochastic data augmentation schemes, for example, randomly dropping edges or masking node features. However, uniform transformations without carefully designed augmentation techniques may drastically change the underlying semantics of graphs or graph nodes. I argue that the graph augmentation schemes should preserve the intrinsic semantics of graphs. Besides, existing GCL methods neglect the semantic information that may introduce false-negative samples. Therefore, a novel GCL method with semantic invariance graph augmentation termed SemiGCL is proposed by designing a semantic invariance graph augmentation (SemiAug) and a semantic-based graph contrastive (SGC) scheme.

Deep graph clustering (DGC), which aims to divide the graph nodes into different clusters, is challenging for graph analysis. DGC usually consists of an encoding neural network and a clustering method. Although DGC has made remarkable progress with the development of deep learning, I observed two drawbacks to the existing methods: 1) Existing methods usually overlook learning the global structural information in the node encoding process. Consequently, the discriminative capability of representations will be limited. 2) Most existing methods leverage traditional clustering methods such as K- means and spectral clustering. However, these clustering methods can not simultaneously be trained with the DGL methods, leading to sub-optimal clustering performance. To address these issues, I propose a novel self-supervised DGC method termed Structural Semantic Contrastive Deep Graph Clustering (SECRET). To get a more discriminative representation, I design a structure contrastive scheme (SCS) by contrasting the aggregation of first-order neighbors with a graph diffusion. A consistent loss was also proposed to keep the structure of different views consistent. To jointly optimize the DGL and clustering method, I proposed a novel Self-supervised Deep-learning-based Clustering (SDC) model.

https://eprints.bournemouth.ac.uk/39691/

Source: Manual