Preventing dataset shift from breaking machine-learning biomarkers.

Journal: GigaScience

Published Date: Sep 28, 2021

Abstract

Machine learning brings the hope of finding new biomarkers extracted from cohorts with rich biomedical measurements. A good biomarker is one that gives reliable detection of the corresponding condition. However, biomarkers are often extracted from a cohort that differs from the target population. Such a mismatch, known as a dataset shift, can undermine the application of the biomarker to new individuals. Dataset shifts are frequent in biomedical research, e.g., because of recruitment biases. When a dataset shift occurs, standard machine-learning techniques do not suffice to extract and validate biomarkers. This article provides an overview of when and how dataset shifts break machine-learning-extracted biomarkers, as well as detection and correction strategies.

Authors

Jérôme Dockès

McGill University, 845 Sherbrooke St W, Montreal, Quebec H3A 0G4, Canada.
Gael Varoquaux

Parietal, INRIA, NeuroSpin, bat 145 CEA Saclay, 91191, Gif sur Yvette, France.
Jean-Baptiste Poline

McGill University, 845 Sherbrooke St W, Montreal, Quebec H3A 0G4, Canada.

Keywords

Algorithms Biomarkers Humans Machine Learning

External Resources

View on PubMed Access via DOI PubMed (34585237)

Preventing dataset shift from breaking machine-learning biomarkers.

Abstract

Authors

Keywords

External Resources

Popular Topics

Recent Journals