Potential limitations in COVID-19 machine learning due to data source variability: A case study in the nCov2019 dataset.
Journal:
Journal of the American Medical Informatics Association : JAMIA
PMID:
33027509
Abstract
OBJECTIVE: The lack of representative coronavirus disease 2019 (COVID-19) data is a bottleneck for reliable and generalizable machine learning. Data sharing is insufficient without data quality, in which source variability plays an important role. We showcase and discuss potential biases from data source variability for COVID-19 machine learning.