Impact of Noisy Supervision in Foundation Model Learning.

Journal: IEEE transactions on pattern analysis and machine intelligence

Published Date: Jul 1, 2025

Abstract

Foundation models are usually pre-trained on large-scale datasets and then adapted to different downstream tasks through tuning. This pre-training and then fine-tuning paradigm has become a standard practice in deep learning. However, the large-scale pre-training datasets, often inaccessible or too expensive to handle, can contain label noise that may adversely affect the generalization of the model and pose unexpected risks. This paper stands out as the first work to comprehensively understand and analyze the nature of noise in pre-training datasets and then effectively mitigate its impacts on downstream tasks. Specifically, through extensive experiments of fully-supervised and image-text contrastive pre-training on synthetic noisy ImageNet-1 K, YFCC15 M, and CC12 M datasets, we demonstrate that, while slight noise in pre-training can benefit in-domain (ID) performance, where the training and testing data share a similar distribution, it always deteriorates out-of-domain (OOD) performance, where training and testing distributions are significantly different. These observations are agnostic to scales of pre-training datasets, pre-training noise types, model architectures, pre-training objectives, downstream tuning methods, and downstream applications. We empirically ascertain that the reason behind this is that the pre-training noise shapes the feature space differently. We then propose a tuning method (NMTune) to affine the feature space to mitigate the malignant effect of noise and improve generalization, which is applicable in both parameter-efficient and black-box tuning manners, considering one may not be able to access or fully fine-tune the pre-trained models. We additionally conduct extensive experiments on popular vision and language models, including APIs, which are supervised and self-supervised pre-trained on realistic noisy data for evaluation. Our analysis and results demonstrate the importance of this novel and fundamental research direction, which we term as Noisy Model Transfer Learning.

Authors

Hao Chen

The First School of Medicine, Wenzhou Medical University, Wenzhou, China.
Zihan Wang

Graduate School, Beijing University of Chinese Medicine, Beijing, China.
Ran Tao

Lieber Institute for Brain Development, Johns Hopkins Medical Campus, Baltimore, Maryland, USA.
Hongxin Wei

MLR Lab, Southern University of Science and Technology.
Xing Xie

Microsoft Research, China. Electronic address: xing.xie@microsoft.com.
Masashi Sugiyama
Bhiksha Raj
Jindong Wang

Microsoft Research, China.

Keywords

No keywords available for this article.

External Resources

View on PubMed Access via DOI PubMed (40117144)

Impact of Noisy Supervision in Foundation Model Learning.

Abstract

Authors

Keywords

External Resources

Popular Topics

Recent Journals

Impact of Noisy Supervision in Foundation Model Learning.

Abstract

Authors

Keywords

External Resources

Stay Ahead of Medical AI

Popular Topics

Recent Journals