User Modeling in Cyberspace

Enlightenments, like accidents, happen only to prepared minds.

--Herbert Simon

User Modeling in Cyberspace

machine learning methodologies to enhance the effectiveness and robustness of fraud detection and online recommendations.

In this project, we design lightweight, componentized models and methodologies aimed at advancing the understanding of user behavior in cyberspace. Our research focuses on developing fine-grained, efficient user representation techniques to support personalized profiling in scenarios such as risk management and intelligent recommendation.

Online Fraud Detection via Test-time Retrieval-based Representation Enrichment (AAAI 2025)

We propose TRE, a lightweight plug-in method that addresses concept drift in anti-fraud systems by retrieving and aggregating embeddings from the top-K most relevant recent samples during test time. This real-time representation enrichment enables classifiers to adapt to evolving fraud tactics. Experiments on large-scale datasets show TRE consistently improves performance over existing methods. [paper]

Financial Risk Assessment via Long-term Payment Behavior Sequence Folding (ICDM 2024)

We propose LBSF, a method that folds long-term user payment behavior sequences by merchants to enable efficient and informative modeling for financial risk assessment. LBSF uses multi-field behavior encoding and aggregates behaviors at the merchant level, followed by relational learning across merchants. Experiments on large-scale datasets show LBSF effectively captures long-term behavioral patterns and improves the accuracy of user financial risk profiles. [paper]

Online Conversion Rate Prediction via Multi-Interval Screening and Synthesizing under Delayed Feedback (AAAI 2024)

We propose MISS, a model for online CVR prediction that uses multi-interval screening with multiple output heads and a lightweight synthesizing module to aggregate their knowledge. MISS addresses delayed feedback and data bias, and achieves strong performance on real-world advertising datasets. [paper]

Online Conversion Rate Prediction via Neural Satellite Networks in Delayed Feedback Advertising (SIGIR 2023)

We propose DFSN, a model for online CVR prediction that addresses delayed feedback by assigning a long waiting window to the main model and integrating satellite networks to learn from fresh data using online transfer learning. This approach reduces fake negatives and improves data freshness. Experiments on real-world advertising datasets show DFSN outperforms existing methods. [paper]

Leveraging Post-Click User Behaviors for Calibrated Conversion Rate Prediction Under Delayed Feedback in Online Advertising (CIKM 2023, short paper)

We propose a method that uses post-click user behaviors to calibrate conversion rate predictions in online advertising, addressing bias from delayed feedback. By treating user behaviors as additional prediction targets and applying an adaptive loss function for multi-task learning, as well as a parameterized scaling technique, our approach achieves more accurate and timely calibration. Experiments on real-world datasets show improved calibration over existing methods. [paper]

Calibrated Conversion Rate Prediction via Knowledge Distillation under Delayed Feedback in Online Advertising (CIKM 2022)

We propose a calibration method for conversion rate prediction under delayed feedback using knowledge distillation. A teacher model learns from samples with complete feedback for long-term patterns, while a student model adapts to recent data to address data shift. A distillation loss aligns the student with the teacher. Experiments show our method delivers more calibrated predictions and generalizes across base models. [paper]

User Behavior Pre-training for Online Fraud Detection (KDD 2022)

We propose UB-PTM, a pretraining model for online fraud detection that learns from large-scale unlabeled user behavior sequences at action, intention, and sequence levels. UB-PTM leverages behavioral data to address insufficient labeling in newborn services. Experiments on multiple fraud detection tasks show UB-PTM outperforms state-of-the-art task-specific models. [paper]

Selective Fairness in Recommendation via Prompts (SIGIR 2022, short paper)

We propose PFRec, a parameter-efficient prompt-based framework for fairness-aware recommendation that enables users to select which sensitive attributes (e.g., age, gender, occupation) should be bias-free. PFRec uses attribute-specific prompts with adversarial training to achieve selective fairness in sequential recommendation. Experiments demonstrate PFRec’s effectiveness across various attribute combinations. [paper] [code]

User-Centric Conversational Recommendation with Multi-Aspect User Modeling (SIGIR 2022)

We propose UCCR, a user-centric conversational recommender system that integrates users’ historical dialogue sessions and look-alike user information to enrich preference modeling. UCCR learns multi-view user preferences and their correlations through self-supervised objectives, and incorporates look-alike users via a temporal selector. Experiments on Chinese and English datasets show UCCR significantly improves both recommendation and dialogue generation over strong baselines. [paper] [code]

Multi-view Multi-behavior Contrastive Learning in Recommendation (DASFAA 2022)

We propose MMCLR, a framework for multi-behavior recommendation that uses multi-view contrastive learning to capture commonalities, multi-view consistency, and fine-grained differences among user behaviors. MMCLR introduces three contrastive learning tasks to address these challenges and achieves state-of-the-art performance on real-world datasets. [paper] [code]

ADAPT: Adversarial Domain Adaptation with Purifier Training for Cross-domain Credit Risk Forecasting (DASFAA 2022)

We propose ADAPT, an adversarial domain adaptation method with purifier training to address intra- and inter-domain imbalance in transfer learning. ADAPT resolves class and sample size imbalance and supports multi-source adaptation via weighted integration. Experiments on a large-scale credit risk dataset show ADAPT outperforms state-of-the-art methods and offers improved interpretability. [paper]

Follow the Prophet: Accurate Online Conversion Rate Prediction in the Face of Delayed Feedback (SIGIR 2021, Best Short Paper Honorable Mention)

In this paper, we propose to tackle the delayed feedback problem in online advertising by “Following the Prophet” (FTP for short). The key insight is that, if the feedback came instantly for all the logged samples, we could get a model without delayed feedback, namely the “prophet”. Although the prophet cannot be obtained during online learning, we show that we could predict the prophet’s predictions by an aggregation policy on top of a set of multi-task predictions, where each task captures the feedback patterns of different periods. [paper]

GuideBoot: Guided Bootstrap for Deep Contextual Banditsin Online Advertising (WWW 2021)

In this paper, we introduce Guided Bootstrap (GuideBoot), which provides explicit guidance to the exploration behavior by training multiple models over both real and noisy samples with fake labels, where the noise is added according to the predictive uncertainty. The proposed method is efficient as it can make decisions on-the-fly by utilizing only one randomly chosen model, but is also effective as we show that it can be viewed as a non-Bayesian approximation of Thompson sampling. Moreover, we extend it to an online version that can learn solely from streaming data, which is favored in real applications. [paper]

Towards Explainable Conversational Recommendation (IJCAI 2020)

In this paper, we introduce explainable conversational recommendation, which enables incremental improvement of both recommendation accuracy and explanation quality through multi-turn usermodel conversation. We design an incremental multi-task learning framework that enables tight collaboration between recommendation prediction, explanation generation, and user feedback integration. We also propose a multi-view feedback integration method to enable effective incremental model update. Empirical results demonstrate that our model not only consistently improves the recommendation accuracy but also generates explanations that fit user interests reflected in the feedbacks. [paper]

Field-aware Calibration: A Simple and Empirically Strong
Method for Reliable Probabilistic Predictions (WWW 2020)

In this paper, we introduce a new evaluation metric named field-level calibration error that measures the bias in predictions over the sensitive input field that the decision-maker concerns. We then propose Neural Calibration, a simple yet powerful post-hoc calibration method that learns to calibrate by making full use of the field-aware information over the validation set.[paper]

Warm Up Cold-start Advertisements: Improving CTR Predictions via Learning to Learn ID Embeddings (SIGIR 2019)

In this work, we aim at improving the performance of CTR predictions during both the cold-start phase and the warm-up phase. We propose an approach coined Meta-Embedding that learns how to learn better embeddings for new ad IDs to address the cold-start problem. Then the embedding generator trained by the method can also speed up the model fitting and take the place of trivial random initializer for new ID embeddings so as to warm up cold-start for the new ad. [paper][code]

Attention-driven Factor Model for Explainable Personalized Recommendation (SIGIR 2018, short)

In this work, we propose the Attention-driven Factor Model (AFM), which can not only integrate item features driven by users’ attention but also can give reasonable explanations for users’ preferences and keep a high prediction accuracy. Meanwhile, we use the Gated Attention Units to extract explicit users’ preference. Taking advantage of rating and item features, the algorithm considers the personalization of different users' attention, and shows good efficiency and accuracy in experiments. [paper]