Tailoring task arithmetic to address bias in models trained on multi-institutional datasets.
Journal:
Journal of biomedical informatics
Published Date:
Jun 8, 2025
Abstract
OBJECTIVE: Multi-institutional datasets are widely used for machine learning from clinical data, to increase dataset size and improve generalization. However, deep learning models in particular may learn to recognize the source of a data element, leading to biased predictions. For example, deep learning models for image recognition trained on chest radiographs with COVID-19 positive and negative examples drawn from different data sources can respond to indicators of provenance (e.g., radiological annotations outside the lung area per institution-specific practices) rather than pathology, generalizing poorly beyond their training data. Bias of this sort, called confounding by provenance, is of concern in natural language processing (NLP) because provenance indicators (e.g., institution-specific section headers, or region-specific dialects) are pervasive in language data. Prior work on addressing such bias has focused on statistical methods, without providing a solution for deep learning models for NLP.