Graph learning for Financial Applications

Enlightenments, like accidents, happen only to prepared minds.

--Herbert Simon

Graph learning for Financial Applications

Modeling temporal, heterogeneous, and large-scale graph-structured data for financial applications, e.g. risk assessment, investment advisement, market behavior analysis, etc.

This research focuses on discovering and modeling latent patterns and domain knowledge from temporal, heterogeneous, and large-scale graph-structured data, which are typically long-range and rapidly evolving. We aim to detect anomalies in user trading behaviors by leveraging advanced graph learning techniques tailored for complex financial environments. Recently, we have been exploring the integration of large language models (LLMs) with graph learning to enable mutual enhancement between semantic understanding and structural reasoning. Selected recent advancements in this direction are summarized below.

GRASP: Differentially Private Graph Reconstruction Defense with Structured Perturbation (KDD 2025)

In this paper, we propose a novel Differentially Private Graph Neural Network based on Structured Perturbation (GRASP), which combines independent and identical noise to achieve bidirectional shifts in the embedding similarity distribution, thereby effectively disrupting the ranking structure and enhancing defense against Graph Reconstruction Attacks (GRA). [paper] [code]

SPEAR: A Structure-Preserving Manipulation Method for Graph Backdoor Attacks (WWW 2025)

To enhance the stealthiness of graph backdoors, we propose SPEAR, a novel structure-preserving graph backdoor attack that avoids modifying the graph's topology. SPEAR operates within a limited attack budget by selectively perturbing node attributes while ensuring the triggers exert significant influence through a global importance-driven feature selection strategy. Additionally, a neighborhood-aware trigger generator is employed to underpin a high attack success rate by utilizing semantic information from the neighborhood. SPEAR amplifies effectiveness and stealthiness by combining subtle yet impactful attribute manipulation with a refined trigger generation mechanism. [paper] [code]

Dynamic Graph Learning with Static Relations for Credit Risk Assessment (AAAI 2025)

We propose DGNN-SR, a novel framework for credit risk assessment that jointly encodes dynamic transaction graphs and static fund transfer graphs. DGNN-SR uses a multi-view time encoder to capture both relative and absolute temporal information, and introduces an adaptive re-weighting strategy to fuse static relations into dynamic representations. Experiments on real-world datasets show DGNN-SR achieves a 0.85%–2.5% performance improvement over existing SOTA methods. [paper]

LOGIN: A Large Language Model Consulted Graph Neural Network Training Framework (WSDM 2025)

We introduce LOGIN, a framework that integrates Large Language Models (LLMs) as consultants within GNN training. LOGIN crafts semantic and topological prompts for nodes and adaptively leverages LLM responses to refine GNNs. Experiments on node classification tasks show that even basic GNNs, with LLM consultation, can match the performance of advanced GNNs.[paper] [code]

Boosting the Adversarial Robustness of Graph Neural Networks: An OOD Perspective (ICLR 2024)

We present a new adversarial training paradigm for graph attack defense by re-examining both poisoning and evasion attacks from an out-of-distribution (OOD) perspective. Our method incorporates OOD detection into adversarial training, addressing the shortcomings of conventional approaches and improving robustness against adaptive attacks. [paper] [code]

F2GNN: An Adaptive Filter with Feature Segmentation for Graph-based Fraud Detection ( ICASSP 2024)

We propose F2GNN, an adaptive filter with feature segmentation for graph-based fraud detection. By segmenting user features and applying adaptive graph filters to each segment, F2GNN better captures subtle fraudulent behaviors and addresses class imbalance. Experiments on real-world datasets show that F2GNN outperforms state-of-the-art methods. [paper] [code]

Incomplete Graph Learning via Attribute-Structure Decoupled Variational Auto-Encoder (WSDM 2024)

We propose ASD-VAE, a neural model that learns a shared latent space from both attribute and structural views of graphs to robustly handle high rates of missing node attributes. ASD-VAE uses a coupled-and-decoupled learning process for multimodal fusion and imputation. Experiments on four real-world incomplete graph datasets show that ASD-VAE outperforms state-of-the-art methods in dealing with missing attributes and improves downstream graph learning tasks. [paper] [code]

FLOOD: A Flexible Invariant Learning Framework for Out-of-Distribution Generalization on Graphs (KDD 2023)

We propose FLOOD, a framework for OOD generalization on graphs that combines invariant learning with bootstrapped self-supervised refinement. FLOOD learns invariant representations across augmented environments and enables flexible encoder adaptation to the test distribution. Experiments show FLOOD consistently outperforms prior graph OOD generalization methods for both transductive and inductive node classification tasks. [paper]

Revisiting Graph Adversarial Attack and Defense From a Data Distribution Perspective (ICLR 2023)

We reveal that gradient-based attacks on GNNs for semi-supervised node classification concentrate adversarial edges around training nodes, explaining their effectiveness from a data distribution perspective. Based on this insight, we provide nine practical tips for both attack and defense, and propose a fast attack method and a self-training defense method that outperform state-of-the-art approaches and scale to large graphs. Extensive experiments on benchmark datasets validate our findings. [paper] [code]

Along the Time: Timeline-traced Embedding for Temporal Knowledge Graph Completion (CIKM 2022)

We propose TLT-KGE, a method for temporal knowledge graph embedding that encodes semantic and temporal information as different axes in complex or quaternion spaces. By modeling their independence and interaction, TLT-KGE enables better distinction of entities and relations across timestamps. Experiments show that TLT-KGE significantly outperforms state-of-the-art methods on temporal knowledge graph completion tasks. [paper] [code]

Explainable Graph-based Fraud Detection via Neural Meta-graph Search (CIKM 2022, Best Short Paper Honorable Mention)

We propose NGS, a framework that formalizes GNN message passing as a meta-graph and uses neural architecture search to optimize the graph structure for fraud detection. By aggregating multiple searched meta-graphs, NGS achieves superior performance and provides interpretable explanations. Experiments on real-world datasets show NGS outperforms state-of-the-art baselines. [paper]

UD-GNN: Uncertainty-aware Debiased Training on Semi-Homophilous Graphs (KDD 2022)

We introduce an Uncertainty-aware Debiasing (UD) framework that addresses GNN bias toward homophilous nodes in mixed-structure graphs. UD estimates output uncertainty to identify heterophilous nodes, then prunes and retrains GNN parameters to improve performance on these nodes. Applied to both homophilous and heterophilous GNNs, UD consistently enhances performance and reduces the gap between homophilous and heterophilous nodes on various datasets. [paper]

Reliable Representations Make A Stronger Defender:Unsupervised Structure Refinement for Robust GNN (KDD 2022)

We propose STABLE, an unsupervised pipeline that learns robust node representations insensitive to structural perturbations for graph structure optimization. The refined graph is then used by an advanced GCN, which improves robustness over standard GCNs without additional computational cost. [paper] [code]

Bi-Level Selection via Meta Gradient for Graph-based Fraud Detection (DASFAA 2022, short paper)

We propose Bi-Level Selection (BLS), an algorithm that improves GNNs for fraud detection by selecting valuable nodes at both the instance and neighborhood levels using meta gradients from a clean validation set. BLS suppresses class imbalance and label noise, and can be applied to most GNNs. Experiments on real-world datasets show BLS significantly boosts GNN performance in fraud detection tasks. [paper]

AUC-oriented Graph Neural Network for Fraud Detection (WWW 2022)

We propose AO-GNN, a model that addresses label imbalance and noisy edges in GNN-based fraud detection by decoupling AUC maximization into classifier parameter search and edge pruning policy search. AO-GNN uses AUC-oriented stochastic gradients and a reinforcement learning module for edge pruning. Experiments on real-world datasets show AO-GNN significantly outperforms state-of-the-art baselines in AUC and other metrics. [paper]

Online Credit Payment Fraud Detection via Structure-Aware Hierarchical Recurrent Neural Network (IJCAI 2021)

In this paper, we adopt multi-scale behavior sequence generated from different granularities of web page structures and propose a model named SAH-RNN to consume the multi-scale behavior sequence for online payment fraud detection. The SAH-RNN has stacked RNN layers in which upper layers modeling for compendious behaviors are updated less frequently and receive the summarized representations from lower layers. A dual attention is devised to capture the impacts on both sequential information within the same sequence and structural information among different granularity of web pages. [paper]

Intention-aware Heterogeneous Graph Attention Networks for Fraud Transactions Detection (KDD 2021)

In this paper, a heterogeneous transaction-intention network is devised to leverage the cross-interaction information over transactions and intentions,, which consists of two types of nodes, namely transaction and intention nodes, and two types of edges, i.e., transaction-intention and transaction-transaction edges. Then we propose a graph neural method coined IHGAT that not only perceives sequence-like intentions, but also encodes the relationship among transactions. Extensive experiments on a real-world dataset of Alibaba platform show that our proposed algorithm outperforms state-of-the-art methods in both offline and online modes. [paper]

Pick and Choose: A GNN-based Imbalanced Learning Approach for Fraud Detection (WWW 2021)

To remedy the class imbalance problem of graph-based fraud detection, we propose a Pick and Choose Graph Neural Network (PC-GNN for short) for imbalanced supervised learning on graphs. First, nodes and edges are picked with a devised label-balanced sampler to construct sub-graphs for mini-batch training. Next, for each node in the sub-graph, the neighbor candidates are chosen by a proposed neighborhood sampler. Finally, information from the selected neighbors and different relations are aggregated to obtain the final representation of a target node. Experiments on both benchmark and real-world graph-based fraud detection tasks demonstrate that PC-GNN apparently outperforms SOTA baselines. [paper][code]

Credit Risk and Limits Forecasting in E-Commerce Consumer Lending Service via Multi-view-aware Mixture-of-experts Nets (WSDM 2021)

In this paper, we propose an end-to-end multi-view and multitask learning based approach named MvMoE (Multi-view-aware Mixture-of-Experts network) to solve credit risk and limits forecasting simultaneously. First, a multi-view network with a hierarchical attention mechanism is constructed to distill users’ heterogeneous financial information into shared hidden representations. Then, we jointly train these two tasks with a view-aware multi-gate mixture-of experts network and a subsequent progressive network to improve their performances. With the real-world dataset contained 5.44 million users, we demonstrate that the proposed model is able to improve AP over 5.60% on credit risk forecasting and MAE over 9.52% on credit limits. [paper]

Learning to Undersampling for Class Imbalanced Credit Risk Forecasting (ICDM 2020)

In this paper, we propose a semi-supervised meta-learning based approach called TRUST (TRainable Undersampling withSelf Training) to resolve class-imbalance proglem in credit risk forecasting. First, it decides whether to sample the data through meta-learning based reinforcement learning. Secondly, it learns the distribution of the data that have not yet shown financial performance via self-training and updates the model trained in the first step. Finally, the updated model is evaluated on the validation dataset, the result of which will be fed back through the evaluator. These three steps will be iterated until the model converges. Experimental results on the real-world industrial dataset containing 1.75million users exhibit that the proposed method is able to improve AP over 5.94%on credit risk forecasting task compared with the recent methods. [paper]

Alike and Unlike: Resolving Class Imbalance Problem in
Financial Credit Risk Assessment (CIKM 2020)

In this paper, we propose a novel adversarial data augmentation method to solve the class imbalance problem in financial credit risk assessment. We
train a generator for synthetic sample generation with a discriminator to identify real or fake instances. Besides, an auxiliary risk discriminator is trained cooperatively with the generator to assess the credit risk. Experimental results on three real-world datasets
demonstrate the effectiveness of the proposed framework. [paper]

Fraud Transactions Detection via Behavior Tree with Local Intention Calibration (KDD 2020)

In this paper, we devise a tree-like structure named behavior tree to reorganize the user behavioral data, in which a group of successive sequential actions denoting a specific user intention are represented as a branch on the tree. We then propose a novel neural method coined LIC Tree-LSTM (Local Intention Calibrated Tree-LSTM) to utilize the behavior tree for fraud transactions detection. We investigate the effectiveness of LIC Tree-LSTM on a real-world dataset of Alibaba platform, and the experimental results show that our proposed algorithm outperforms state-of-the-art methods in both offline and online modes. [paper]

Financial Defaulter Detection on Online Credit Payment via Multi-view Attributed Heterogeneous Information Network (WWW 2020)

In this paper, we propose a multi-view attributed heterogeneous information network based approach coined MAHINDER for defaulter detection. First, multiple views of user behaviors are adopted to learn personal profile due to the endogenous aspect of financial default. Second, local behavioral patterns are specifically modeled since financial default is adversarial and accumulated. The experimental resuts on real-world datasets on Alibaba platform exhibit the proposed approach is able to improve AUC over 2.8% and Recall@Precision=0.1 over 13.1% compared with the state-of-the-art methods. [paper]

Spatiotemporal Activity Modeling via Hierarchical Cross-Modal Embedding (IEEE TKDE 2020)

In this paper, we construct two graphs to represent the user interactions on social media and propose a hierarchical cross-modal embedding method that takes the high-order relationships into consideration. The key notion behind our method is a novel hierarchical embedding framework with meta-graphs connecting different layers. We introduce both inter-record and intra-record meta-graph structures, which enable learning distributed representations that preserve high-order proximities across graphs from different layers. Our empirical experiments on three real-world datasets demonstrate that our method not only outperforms state-of-the-art methods for spatiotemporal activity prediction, but also captures cross-modal proximity at a finer granularity. [paper]

Online Frequent Episode Mining (ICDE 2015)

Most existing FEM (Frequent Episode Mining) solutions are time-consuming. For fast-growing sequence data, old episodes may become obsolete while new useful episodes keep emerging. We proposed an algorithm named MESELO (Mining frEquent Serial Episode via Last Occurrence), which applies episode trie to store all minimal occurrences of episodes and adapts to rapidly growing data. We theoretically prove the proposed algorithm's soundness and completeness, and experimental results on both synthetic and real datasets show the superiority of our proposed algorithms. [paper][code]

Mining Precise-Positioning Episode Rules from Event Sequences (ICDE 2017, IEEE TKDE 2018)

We come up with the concept of ﬁxed-gap episode and develop a trie-based data structure to mine such precise-positioning episode rules with several pruning strategies. A ﬁxed-gap episode consists of an ordered set of events where the elapsed time between any two consecutive events is a constant. Experimental results on real datasets show the solution can also satisfy the requirement of many time sensitive applications. [paper][code]

Large-Scale Frequent Episode Mining from Complex Event Sequences with Hierarchies (ACM TIST 2019)

In this work, we propose a scalable distributed framework LA-FEMH (Large-scale Frequent Episode Mining with Hierarchies) to partition the sequence into pieces. We adopt optimized rewrite skills and devise a local mining algorithm PEM (Peak Episode Miner) to improve local mining performance. We also make an extension of our framework and propose LA-FEMH+ to support other episode mining tasks such as maximal and closed episode mining in the context of event hierarchies. [paper]