Few shot learning for phenotype-driven diagnosis of patients with rare genetic diseases.

Journal: NPJ digital medicine
Published Date:

Abstract

There are over 7000 rare diseases, some affecting 3500 or fewer patients in the United States. Due to clinicians' limited experience with such diseases and the heterogeneity of clinical presentations, ~70% of individuals seeking a diagnosis remain undiagnosed. Deep learning has demonstrated success in aiding the diagnosis of common diseases. However, existing approaches require labeled datasets with thousands of diagnosed patients per disease. We present SHEPHERD, a few-shot learning approach for multi-faceted rare disease diagnosis. SHEPHERD performs deep learning over a knowledge graph enriched with rare disease information and is trained on a dataset of simulated rare disease patients. We demonstrate SHEPHERD's effectiveness across diverse diagnostic tasks, performing causal gene discovery, retrieving "patients-like-me", and characterizing novel disease presentations, using real-world cohorts from the Undiagnosed Diseases Network (N = 465), MyGene2 (N = 146), and the Deciphering Developmental Disorders study (N = 1431). SHEPHERD demonstrates the potential of knowledge-grounded deep learning to accelerate rare disease diagnosis.

Authors

  • Emily Alsentzer
    Biomedical Informatics Training Program, Stanford University, Stanford, CA.
  • Michelle M Li
    Bioinformatics and Integrative Genomics Program, Harvard Medical School, Boston, MA, USA.
  • Shilpa N Kobren
    Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA.
  • Ayush Noori
    Department of Computer Science, Harvard John A. Paulson School of Engineering and Applied Sciences, Allston, Massachusetts, USA.
  • Isaac S Kohane
    Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA. Isaac_Kohane@hms.harvard.edu.
  • Marinka Zitnik
    Department of Computer Science, Stanford University.

Keywords

No keywords available for this article.