Preparing Medical Imaging Data for Machine Learning.

Journal: Radiology
Published Date:

Abstract

Artificial intelligence (AI) continues to garner substantial interest in medical imaging. The potential applications are vast and include the entirety of the medical imaging life cycle from image creation to diagnosis to outcome prediction. The chief obstacles to development and clinical implementation of AI algorithms include availability of sufficiently large, curated, and representative training data that includes expert labeling (eg, annotations). Current supervised AI methods require a curation process for data to optimally train, validate, and test algorithms. Currently, most research groups and industry have limited data access based on small sample sizes from small geographic areas. In addition, the preparation of data is a costly and time-intensive process, the results of which are algorithms with limited utility and poor generalization. In this article, the authors describe fundamental steps for preparing medical imaging data in AI algorithm development, explain current limitations to data curation, and explore new approaches to address the problem of data availability.

Authors

  • Martin J Willemink
    Departments of Radiology (L.D.H., G.M., K.H., M.K., M.J.W., A.M.S., D.F.) and Surgery (M.F.), Stanford University School of Medicine, 300 Pasteur Dr, Room S-072, Stanford, CA 94305-5105.
  • Wojciech A Koszek
    From the Department of Radiology, Stanford University School of Medicine, 300 Pasteur Dr, S-072, Stanford, CA 94305-5105 (M.J.W., D.F., D.L.R., M.P.L.); Segmed, Menlo Park, Calif (M.J.W., W.A.K., C.H., J.W.); School of Engineering, Stanford University, Stanford, Calif (J.W.); Institute of Cognitive Neuroscience, University College London, London, England (H.H.); Radiology and Imaging Sciences, National Institutes of Health Clinical Center, Bethesda, Md (L.R.F.); Imaging Biomarkers and Computer-Aided Diagnosis Laboratory, National Institutes of Health, Clinical Center, Bethesda, Md (R.M.S.); Department of Biomedical Data Science, Stanford University School of Medicine, Stanford, Calif (D.L.R.); and Stanford Center for Artificial Intelligence in Medicine and Imaging (AIMI), Stanford, Calif (M.P.L.).
  • Cailin Hardell
    From the Department of Radiology, Stanford University School of Medicine, 300 Pasteur Dr, S-072, Stanford, CA 94305-5105 (M.J.W., D.F., D.L.R., M.P.L.); Segmed, Menlo Park, Calif (M.J.W., W.A.K., C.H., J.W.); School of Engineering, Stanford University, Stanford, Calif (J.W.); Institute of Cognitive Neuroscience, University College London, London, England (H.H.); Radiology and Imaging Sciences, National Institutes of Health Clinical Center, Bethesda, Md (L.R.F.); Imaging Biomarkers and Computer-Aided Diagnosis Laboratory, National Institutes of Health, Clinical Center, Bethesda, Md (R.M.S.); Department of Biomedical Data Science, Stanford University School of Medicine, Stanford, Calif (D.L.R.); and Stanford Center for Artificial Intelligence in Medicine and Imaging (AIMI), Stanford, Calif (M.P.L.).
  • Jie Wu
    Center of Disease Control of Qingdao, 175 Shandong Road, Qingdao, Shandong, 266001, China.
  • Dominik Fleischmann
    Departments of Radiology (L.D.H., G.M., K.H., M.K., M.J.W., A.M.S., D.F.) and Surgery (M.F.), Stanford University School of Medicine, 300 Pasteur Dr, Room S-072, Stanford, CA 94305-5105.
  • Hugh Harvey
    Institute of Cognitive Neurosciences, University College London, Alexandra House, 17-19 Queen Square, Bloomsbury, London WC1N 3AZ, England.
  • Les R Folio
    Radiology and Imaging Sciences, Clinical Center, National Institutes of Health, Bethesda, MD, 20892, USA.
  • Ronald M Summers
    National Institutes of Health, Clinical Center, Radiology and Imaging Sciences, 10 Center Drive, Bethesda, MD 20892, USA.
  • Daniel L Rubin
    Department of Biomedical Data Science, Stanford University School of Medicine Medical School Office Building, Stanford CA 94305-5479.
  • Matthew P Lungren