Learning fair representation for fine-tuning pre-trained language models.

Journal: Neural networks : the official journal of the International Neural Network Society
Published Date:

Abstract

Pre-trained language models (PLMs) have achieved remarkable success across a wide range of natural language processing tasks, including text classification, machine translation, and question-answering systems, by leveraging vast amounts of unlabeled data to learn rich linguistic representations. However, existing models often reflect human-like biases and societal stereotypes, posing a significant challenge in their application. To address this issue, this paper proposes a novel debiasing framework called CFPLM. Unlike conventional debiasing methods, CFPLM is grounded in causal inference, aiming to identify and intervene in the factors that contribute to bias, thereby eliminating the bias in PLMs. The framework incorporates a composite loss function, which introduces a fairness penalty term to regulate the learning process of the model. Additionally, it integrates adversarial loss and entropy regularization to further optimize model performance. Experimental results demonstrate that, based on standard datasets and evaluation metrics, the proposed CFPLM method significantly reduces bias in BERT, RoBERTa, and ALBERT, while results on the GLUE benchmark indicate that enhancing model fairness does not compromise the models' language understanding capabilities.

Authors

Keywords

No keywords available for this article.