ENRICHing medical imaging training sets enables more efficient machine learning.

Journal: Journal of the American Medical Informatics Association : JAMIA
PMID:

Abstract

OBJECTIVE: Deep learning (DL) has been applied in proofs of concept across biomedical imaging, including across modalities and medical specialties. Labeled data are critical to training and testing DL models, but human expert labelers are limited. In addition, DL traditionally requires copious training data, which is computationally expensive to process and iterate over. Consequently, it is useful to prioritize using those images that are most likely to improve a model's performance, a practice known as instance selection. The challenge is determining how best to prioritize. It is natural to prefer straightforward, robust, quantitative metrics as the basis for prioritization for instance selection. However, in current practice, such metrics are not tailored to, and almost never used for, image datasets.

Authors

  • Erin Chinn
    Department of Medicine, Division of Cardiology, Department of Radiology, Bakar Computational Health Sciences Institute, University of California, San Francisco, San Francisco, California, USA.
  • Rohit Arora
  • Ramy Arnaout
  • Rima Arnaout