Reinforcement Learning from Human Feedback (RLHF) has emerged as a powerful technique for aligning large language models (LLMs) with human preferences. However, effectively aligning LLMs with diverse human preferences remains a significant challenge, particularly when they are conflict. To address this issue, we frame human value alignment as a multi-objective optimization problem, aiming to maximize a set of potentially conflicting objectives. We introduce Gradient-Adaptive Policy Optimization (GAPO), a novel fine-tuning paradigm that employs multiple-gradient descent to align LLMs with diverse preference distributions. GAPO adaptively rescales the gradients for each objective to determine an update direction that optimally balances the trade-offs between objectives. Additionally, we introduce P-GAPO, which incorporates user preferences across different objectives and achieves Pareto solutions that better align with the user's specific needs.
We found that truly personalized news headlines depend not only on what a user is interested in but also on their preferred stylistic delivery, a crucial and largely ignored dimension in previous work. We therefore introduced SCAPE, a novel framework that leverages Large Language Models to infer and embed the stylistic and content attributes of headlines, enabling a more nuanced analysis of a user's reading history. It then adaptively fuses these "panoramic interests" to guide the generation process, resulting in headlines that demonstrably better capture an individual's taste in both topic and tone. [paper] [code]
In this work, we propose Generation with Concept Activation Vector (GCAV), a lightweight framework for controlling large language model (LLM) outputs. GCAV enables fine-grained control without heavy fine-tuning. It first trains concept activation vectors for specific concepts, like toxicity. During inference, it manipulates these vectors within LLMs—for instance, removing the toxicity vector to reduce harmful output. [paper]
In this paper, we revisit the evaluation of in-context learning (ICL) with large language models, highlighting a critical yet overlooked factor—the cost of configuring demonstrations. Our study reveals a strong correlation between configuration cost and task performance, exposing unfairness in existing evaluation protocols. To address this, we propose a two-dimensional evaluation paradigm that jointly considers both performance and cost. Moreover, we introduce a simple, generalizable strategy that balances these factors effectively and enhances ICL across models of different sizes. [paper]
We introduce the Event-Level Financial Sentiment Analysis (EFSA) task, which extracts quintuples—(company, industry, coarse-grained event, fine-grained event, sentiment)—from financial texts by reformulating event extraction as a classification problem with hierarchical event categories. We present a large-scale Chinese dataset with 12,160 news articles and 13,725 annotated quintuples, and propose a four-hop Chain-of-Thought LLM-based method. Extensive experiments on this benchmark demonstrate that our approach achieves state-of-the-art results. [paper] [code]
Row-based methods for question answering over tables and text often fail on complex questions because they struggle to compare information across multiple rows and cannot adapt to the shifting focus required in multi-step reasoning. To solve this, we propose DRAMA, a framework that constructs a multi-granularity graph and uses a novel memory bank to efficiently incorporate comparative information from many different rows. Furthermore, DRAMA features a Dynamic Graph Attention Network that adjusts to the question's changing focus, allowing it to effectively filter noise and achieve state-of-the-art performance on complex, multi-hop QA tasks. [paper]
In this work, we propose Distillation with Explanations from LLMs, a mechanism for LLMs distillation on multi-choice QA tasks. While LLMs like ChatGPT offer a cost-effective way for data labeling and can provide explanations alongside answers, their inaccuracies risk introducing noise. Our approach capitalizes on the consistency between LLMs’ incorrect answers and their corresponding explanations, integrating ground truth labels with LLM-generated content. This enables simultaneous generation of more precise answers and consistent free-text explanations. [paper]
This paper explores the challenges of applying retrieval-augmented methods in few-shot scenarios, where limited retrieval space makes off-the-shelf metrics ineffective. We show that learning a task-specific retrieval metric is essential, but standard cross-entropy loss struggles with weak supervision and gradient vanishing. To overcome this, we propose two new training objectives—EM-L and R-L—that offer stronger task-specific guidance. Experiments on 10 datasets demonstrate the effectiveness of our method across various few-shot tasks. [paper]
Existing methods for personalized news headline generation often compromise factual consistency in pursuit of user engagement, leading to potentially misleading or incorrect titles. To address this trade-off, we propose the Fact-Preserved Personalized News Headline Generation (FPG) framework, which leverages user click history to selectively highlight facts within the source article that align with the reader's interests. By further incorporating a fact-enhanced training phase based on contrastive learning, FPG successfully generates headlines that are demonstrably both highly personalized and factually sound. [paper] [code]
In this paper, we aim to adapt the idea of retrieval-based neural approaches to the Aspect Sentiment Triplet Extraction (ASTE) task. We propose a novel retrieval-based method, RLI, for ASTE. Unlike prior approaches that retrieve semantically similar examples, RLI leverages both semantic and sentiment label similarity to improve performance. It interpolates label information from retrieved triplets into the representation of target pairs, with a retriever jointly trained under distant supervision. Our method achieves new state-of-the-art results on two standard ASTE benchmarks. [paper] [code]
Improving Table Question Answering through data augmentation has traditionally relied on "top-down" methods using handcrafted templates and rules, which are often limited in coverage and expensive to create. To overcome this, we propose SIG-TQA, a "bottom-up" framework that instead automatically extracts diverse semantic patterns directly from existing text-to-SQL datasets with minimal human effort. These data-driven patterns then guide the generation of new, high-quality question-SQL pairs, demonstrably boosting the performance of Table QA parsers in both academic benchmarks and real-world industry applications. [paper]
In this paper, we introduce the multilingual knowledge graph (KG) to the CLIR task due to the sufficient information of entities in multiple languages. We propose a model named CLIR with hierarchical knowledge
enhancement (HIKE) for our task. The proposed model encodes the textual information in queries, documents and the KG with multilingual BERT, and incorporates the KG information in the query-document matching process with a hierarchical information fusion mechanism. Experimental results demonstrate that HIKE achieves substantial improvements over state-of-the-art competitors.
In this paper, we investigate the unified ABSA task from the perspective of Machine Reading Comprehension (MRC) by observing that the aspect and the opinion terms can serve as the query and answer in MRC interchangeably. We propose a new paradigm named Role Flipped Machine Reading Comprehension (RF-MRC) to resolve. At its heart, the predicted results of either the Aspect Term Extraction (ATE) or the Opinion Terms Extraction (OTE) are regarded as the queries, respectively, and the matched opinion or aspect terms are extracted as answers. The queries and answers can be flipped for multi-hop detection. Finally, every matched aspect-opinion pair is predicted by the sentiment classifier. RF-MRC can solve the ABSA task without extra data annotation. Experiments on three widely use benchmarks and a challenging dataset demonstrate the superiority of the proposed framework. [paper]
In this paper, we formulate the personalized news headline generation problem whose goal is to output a user-specific title based on both a user’s reading interests and a candidate news body to be exposed to her. To build up a benchmark for this problem, we publicize a large-scale dataset named PENS. The training set is collected from user impressions logs of Microsoft News, and the test set is manually created by hundreds of native speakers to enable a fair testbed for evaluating models in an offline mode. We propose a generic framework as a preparatory solution to our problem. We investigate our dataset by implementing several state-of-the-art user modeling methods in our framework to demonstrate a benchmark score for the proposed dataset. The dataset is available at https://msnews.github.io/pens.html. [paper][code]
In this paper, we propose a multi-task learning approach named MIN to make flexible use of sub-tasks for a unified ABSA. We divide the sub-tasks of ABSA into extractive sub-tasks and classification sub-tasks, and optimize these sub-tasks in a unified manner with multiplex interaction mechanisms. Specifically, we devise a pairwise attention to exploit bidirectional interactions between any arbitrary pair of extractive sub-tasks and a consistency-weighting to perform unidirectional interaction from an extractive sub-task to a classification sub-task. Since the proposed interaction mechanisms are task-agnostic, our model can also work well when some specific sub-tasks are absent. [paper][code]
In this paper, we aim to improve ATSA by discovering the potential aspect terms of the predicted sentiment polarity when the aspect terms of a test sentence are unknown. We access this goal by proposing a capsule network based model named CAPSAR. In CAPSAR, sentiment categories are denoted by capsules and aspect term information is injected into sentiment capsules through a sentiment-aspect reconstruction procedure during the training. As a result, coherent patterns between aspects and sentimental expressions are encapsulated by these sentiment capsules. Experiments on three widely used benchmarks demonstrate these patterns have potential in exploring aspect terms from test sentence when only feeding the sentence to the model. [paper]
In this paper, we propose a neural model named TWASP for joint CWS and POS tagging following the character-based sequence labeling paradigm, where a two-way attention mechanism is used to incorporate both context feature and their corresponding syntactic knowledge for each input character. Particularly, we use existing language processing toolkits to obtain the auto-analyzed syntactic knowledge for the context, and the proposed attention module can learn and benefit from them although their quality may not be perfect. Our experiments illustrate the effectiveness of the two-way attentions for joint CWS and POS tagging, where state-of-the-art performance is achieved on five benchmark dataset. [paper]
In
this paper, we present a new large-scale Multi Aspect Multi-Sentiment (MAMS) dataset, in which each sentence contains at least two different aspects with different sentiment polarities. The release of this dataset would push forward the research in this field. In addition, we propose simple yet effective CapsNet and CapsNet-BERT models which combine the strengths of recent NLP advances. Experiments on our new dataset show that the proposed model significantly outperforms the
state-of-the-art baseline methods. [paper]
In this work, we re-examine extractive text summarization by simulating the process of extracting summarization of human. We adopt a convolutional neural network to encode gist of paragraphs for rough reading, and a decision making policy with an adapted termination mechanism for careful reading. [paper][code]
In this work, we present an unsupervised neural framework that leverages sememes to enhance lexical semantics. We propose a sememe attention structure to represent word meanings and add an RNN sentence encoder for guiding the sememe exploration. The experimental results show that our model is superior to the existing models especially on identifying infrequent aspects. [paper][code]
We propose an interpretable framework coined FISHQA (FInancial Sentiment analysis network with Hierarchical Query-driven Attention) for financial sentiment analysis. Multiple user specified queries are contributed to distill document representation with query based attention mechanism. The experiments demonstrate that our framework can learn better representation of the document, unearth meaningful clues on replying different users’ preferences and outperforms the state-of-the-art methods. [paper][code]
AI Website Generator