Open access image repositories: high-quality data to enable machine learning research.

Journal: Clinical radiology
Published Date:

Abstract

Originally motivated by the need for research reproducibility and data reuse, large-scale, open access information repositories have become key resources for training and testing of advanced machine learning applications in biomedical and clinical research. To be of value, such repositories must provide large, high-quality data sets, where quality is defined as minimising variance due to data collection protocols and data misrepresentations. Curation is the key to quality. We have constructed a large public access image repository, The Cancer Imaging Archive, dedicated to the promotion of open science to advance the global effort to diagnose and treat cancer. Drawing on this experience and our experience in applying machine learning techniques to the analysis of radiology and pathology image data, we will review the requirements placed on such information repositories by state-of-the-art machine learning applications and how these requirements can be met.

Authors

  • F Prior
    Department of Biomedical Informatics, University of Arkansas for Medical Sciences, 4301 W. Markham St, Little Rock, AR 72205, USA. Electronic address: fwprior@uams.edu.
  • J Almeida
    National Institutes of Health, National Cancer Institute, 9609 Medical Center Drive, Bethesda, MD 20892, USA.
  • P Kathiravelu
    Department of Biomedical Informatics, Emory University, 101 Woodruff Circle, #4104, Atlanta, GA 30322, USA.
  • T Kurc
    Department of Biomedical Informatics, Stoney Brook University, Health Science Center Level 3, Room 043, Stony Brook, NY 11794, USA.
  • K Smith
    Department of Biomedical Informatics, University of Arkansas for Medical Sciences, 4301 W. Markham St, Little Rock, AR 72205, USA.
  • T J Fitzgerald
    Department of Radiation Oncology, University of Massachusetts Medical School, Worcester, MA 01655, USA.
  • J Saltz
    Department of Biomedical Informatics, Stoney Brook University, Health Science Center Level 3, Room 043, Stony Brook, NY 11794, USA.