Machine learning in rare disease.

Journal: Nature methods
Published Date:

Abstract

High-throughput profiling methods (such as genomics or imaging) have accelerated basic research and made deep molecular characterization of patient samples routine. These approaches provide a rich portrait of genes, molecular pathways and cell types involved in disease phenotypes. Machine learning (ML) can be a useful tool for extracting disease-relevant patterns from high-dimensional datasets. However, depending upon the complexity of the biological question, machine learning often requires many samples to identify recurrent and biologically meaningful patterns. Rare diseases are inherently limited in clinical cases, leading to few samples to study. In this Perspective, we outline the challenges and emerging solutions for using ML for small sample sets, specifically in rare diseases. Advances in ML methods for rare diseases are likely to be informative for applications beyond rare diseases for which few samples exist with high-dimensional data. We propose that the method community prioritize the development of ML techniques for rare disease research.

Authors

  • Jineta Banerjee
    Sage Bionetworks, Seattle, WA, USA.
  • Jaclyn N Taroni
    Department of Systems Pharmacology and Translational Therapeutics; Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA.
  • Robert J Allaway
    Department of Molecular and Systems Biology, Geisel School of Medicine at Dartmouth, Dartmouth College, HB 7650, Hanover, NH, 03755, USA.
  • Deepashree Venkatesh Prasad
    Childhood Cancer Data Lab, Alex's Lemonade Stand Foundation, Philadelphia, PA, USA.
  • Justin Guinney
    Computational Oncology, Sage Bionetworks, Seattle, Washington.
  • Casey Greene
    Department of Biomedical Informatics, University of Colorado School of Medicine, Aurora, CO, USA. casey.s.greene@cuanschutz.edu.