Learning fair representation for fine-tuning pre-trained language models.

Journal: Neural networks : the official journal of the International Neural Network Society

Published Date: Feb 9, 2026

Abstract

Pre-trained language models (PLMs) have achieved remarkable success across a wide range of natural language processing tasks, including text classification, machine translation, and question-answering systems, by leveraging vast amounts of unlabeled data to learn rich linguistic representations. However, existing models often reflect human-like biases and societal stereotypes, posing a significant challenge in their application. To address this issue, this paper proposes a novel debiasing framework called CFPLM. Unlike conventional debiasing methods, CFPLM is grounded in causal inference, aiming to identify and intervene in the factors that contribute to bias, thereby eliminating the bias in PLMs. The framework incorporates a composite loss function, which introduces a fairness penalty term to regulate the learning process of the model. Additionally, it integrates adversarial loss and entropy regularization to further optimize model performance. Experimental results demonstrate that, based on standard datasets and evaluation metrics, the proposed CFPLM method significantly reduces bias in BERT, RoBERTa, and ALBERT, while results on the GLUE benchmark indicate that enhancing model fairness does not compromise the models' language understanding capabilities.

Learning fair representation for fine-tuning pre-trained language models.

Abstract

Authors

Keywords

External Resources

Popular Topics

Recent Journals

Learning fair representation for fine-tuning pre-trained language models.

Abstract

Authors

Keywords

External Resources

Don't Miss the Future of Medicine

Popular Topics

Recent Journals