Behavioral Data Mining

Enlightenments, like accidents, happen only to prepared minds.

--Herbert Simon

Financial Behavioral Data Mining

Discovering and learning pattern from temporal, heterogenous and large-scale financial-related behavioral data

In this study, we aim at discovering and learning patterns or knowledges from temporal, heterogenous and large-scale financial behavioral data which is always long and fast-growing. We propose online, distributed algorithms and theories of pattern interestingness to unearth useful patterns. The proposed algorithms have been effectively applied in time series of prices and trading transactions. Recently, we focus on detecting anomalies from user trading behaviors through graph mining techniques. They are also employed for other related applications in intelligent transportation and computing journalism.

Online Credit Payment Fraud Detection via Structure-Aware Hierarchical Recurrent Neural Network (IJCAI 2021)

In this paper, we adopt multi-scale behavior sequence generated from different granularities of web page structures and propose a model named SAH-RNN to consume the multi-scale behavior sequence for online payment fraud detection. The SAH-RNN has stacked RNN layers in which upper layers modeling for compendious behaviors are updated less frequently and receive the summarized representations from lower layers. A dual attention is devised to capture the impacts on both sequential information within the same sequence and structural information among different granularity of web pages. [paper]

Intention-aware Heterogeneous Graph Attention Networks for Fraud Transactions Detection (KDD 2021)

In this paper, a heterogeneous transaction-intention network is devised to leverage the cross-interaction information over transactions and intentions,, which consists of two types of nodes, namely transaction and intention nodes, and two types of edges, i.e., transaction-intention and transaction-transaction edges. Then we propose a graph neural method coined IHGAT that not only perceives sequence-like intentions, but also encodes the relationship among transactions. Extensive experiments on a real-world dataset of Alibaba platform show that our proposed algorithm outperforms state-of-the-art methods in both offline and online modes. [paper]

Pick and Choose: A GNN-based Imbalanced Learning Approach for Fraud Detection (WWW 2021)

To remedy the class imbalance problem of graph-based fraud detection, we propose a Pick and Choose Graph Neural Network (PC-GNN for short) for imbalanced supervised learning on graphs. First, nodes and edges are picked with a devised label-balanced sampler to construct sub-graphs for mini-batch training. Next, for each node in the sub-graph, the neighbor candidates are chosen by a proposed neighborhood sampler. Finally, information from the selected neighbors and different relations are aggregated to obtain the final representation of a target node. Experiments on both benchmark and real-world graph-based fraud detection tasks demonstrate that PC-GNN apparently outperforms SOTA baselines. [paper][code]

Credit Risk and Limits Forecasting in E-Commerce Consumer Lending Service via Multi-view-aware Mixture-of-experts Nets (WSDM 2021)

In this paper, we propose an end-to-end multi-view and multitask learning based approach named MvMoE (Multi-view-aware Mixture-of-Experts network) to solve credit risk and limits forecasting simultaneously. First, a multi-view network with a hierarchical attention mechanism is constructed to distill users’ heterogeneous financial information into shared hidden representations. Then, we jointly train these two tasks with a view-aware multi-gate mixture-of experts network and a subsequent progressive network to improve their performances. With the real-world dataset contained 5.44 million users, we demonstrate that the proposed model is able to improve AP over 5.60% on credit risk forecasting and MAE over 9.52% on credit limits. [paper]

Learning to Undersampling for Class Imbalanced Credit Risk Forecasting (ICDM 2020)

In this paper, we propose a semi-supervised meta-learning based approach called TRUST (TRainable Undersampling withSelf Training) to resolve class-imbalance proglem in credit risk forecasting. First, it decides whether to sample the data through meta-learning based reinforcement learning. Secondly, it learns the distribution of the data that have not yet shown financial performance via self-training and updates the model trained in the first step. Finally, the updated model is evaluated on the validation dataset, the result of which will be fed back through the evaluator. These three steps will be iterated until the model converges. Experimental results on the real-world industrial dataset containing 1.75million users exhibit that the proposed method is able to improve AP over 5.94%on credit risk forecasting task compared with the recent methods. [paper]

Alike and Unlike: Resolving Class Imbalance Problem in
Financial Credit Risk Assessment (CIKM 2020)

In this paper, we propose a novel adversarial data augmentation method to solve the class imbalance problem in financial credit risk assessment. We
train a generator for synthetic sample generation with a discriminator to identify real or fake instances. Besides, an auxiliary risk discriminator is trained cooperatively with the generator to assess the credit risk. Experimental results on three real-world datasets
demonstrate the effectiveness of the proposed framework. [paper]

Fraud Transactions Detection via Behavior Tree with Local Intention Calibration (KDD 2020)

In this paper, we devise a tree-like structure named behavior tree to reorganize the user behavioral data, in which a group of successive sequential actions denoting a specific user intention are represented as a branch on the tree. We then propose a novel neural method coined LIC Tree-LSTM (Local Intention Calibrated Tree-LSTM) to utilize the behavior tree for fraud transactions detection. We investigate the effectiveness of LIC Tree-LSTM on a real-world dataset of Alibaba platform, and the experimental results show that our proposed algorithm outperforms state-of-the-art methods in both offline and online modes. [paper]

Financial Defaulter Detection on Online Credit Payment via Multi-view Attributed Heterogeneous Information Network (WWW 2020)

In this paper, we propose a multi-view attributed heterogeneous information network based approach coined MAHINDER for defaulter detection. First, multiple views of user behaviors are adopted to learn personal profile due to the endogenous aspect of financial default. Second, local behavioral patterns are specifically modeled since financial default is adversarial and accumulated. The experimental resuts on real-world datasets on Alibaba platform exhibit the proposed approach is able to improve AUC over 2.8% and Recall@Precision=0.1 over 13.1% compared with the state-of-the-art methods. [paper]

Spatiotemporal Activity Modeling via Hierarchical Cross-Modal Embedding (IEEE TKDE 2020)

In this paper, we construct two graphs to represent the user interactions on social media and propose a hierarchical cross-modal embedding method that takes the high-order relationships into consideration. The key notion behind our method is a novel hierarchical embedding framework with meta-graphs connecting different layers. We introduce both inter-record and intra-record meta-graph structures, which enable learning distributed representations that preserve high-order proximities across graphs from different layers. Our empirical experiments on three real-world datasets demonstrate that our method not only outperforms state-of-the-art methods for spatiotemporal activity prediction, but also captures cross-modal proximity at a finer granularity. [paper]

Online Frequent Episode Mining (ICDE 2015)

Most existing FEM (Frequent Episode Mining) solutions are time-consuming. For fast-growing sequence data, old episodes may become obsolete while new useful episodes keep emerging. We proposed an algorithm named MESELO (Mining frEquent Serial Episode via Last Occurrence), which applies episode trie to store all minimal occurrences of episodes and adapts to rapidly growing data. We theoretically prove the proposed algorithm's soundness and completeness, and experimental results on both synthetic and real datasets show the superiority of our proposed algorithms. [paper][code]

Mining Precise-Positioning Episode Rules from Event Sequences (ICDE 2017, IEEE TKDE 2018)

We come up with the concept of ﬁxed-gap episode and develop a trie-based data structure to mine such precise-positioning episode rules with several pruning strategies. A ﬁxed-gap episode consists of an ordered set of events where the elapsed time between any two consecutive events is a constant. Experimental results on real datasets show the solution can also satisfy the requirement of many time sensitive applications. [paper][code]

Large-Scale Frequent Episode Mining from Complex Event Sequences with Hierarchies (ACM TIST 2019)

In this work, we propose a scalable distributed framework LA-FEMH (Large-scale Frequent Episode Mining with Hierarchies) to partition the sequence into pieces. We adopt optimized rewrite skills and devise a local mining algorithm PEM (Peak Episode Miner) to improve local mining performance. We also make an extension of our framework and propose LA-FEMH+ to support other episode mining tasks such as maximal and closed episode mining in the context of event hierarchies. [paper]