Application of deep learning algorithm on whole genome sequencing data uncovers structural variants associated with multiple mental disorders in African American patients.

Journal: Molecular psychiatry
PMID:

Abstract

Mental disorders present a global health concern, while the diagnosis of mental disorders can be challenging. The diagnosis is even harder for patients who have more than one type of mental disorder, especially for young toddlers who are not able to complete questionnaires or standardized rating scales for diagnosis. In the past decade, multiple genomic association signals have been reported for mental disorders, some of which present attractive drug targets. Concurrently, machine learning algorithms, especially deep learning algorithms, have been successful in the diagnosis and/or labeling of complex diseases, such as attention deficit hyperactivity disorder (ADHD) or cancer. In this study, we focused on eight common mental disorders, including ADHD, depression, anxiety, autism, intellectual disabilities, speech/language disorder, delays in developments, and oppositional defiant disorder in the ethnic minority of African Americans. Blood-derived whole genome sequencing data from 4179 individuals were generated, including 1384 patients with the diagnosis of at least one mental disorder. The burden of genomic variants in coding/non-coding regions was applied as feature vectors in the deep learning algorithm. Our model showed ~65% accuracy in differentiating patients from controls. Ability to label patients with multiple disorders was similarly successful, with a hamming loss score less than 0.3, while exact diagnostic matches are around 10%. Genes in genomic regions with the highest weights showed enrichment of biological pathways involved in immune responses, antigen/nucleic acid binding, chemokine signaling pathway, and G-protein receptor activities. A noticeable fact is that variants in non-coding regions (e.g., ncRNA, intronic, and intergenic) performed equally well as variants in coding regions; however, unlike coding region variants, variants in non-coding regions do not express genomic hotspots whereas they carry much more narrow standard deviations, indicating they probably serve as alternative markers.

Authors

  • Yichuan Liu
    Center for Applied Genomics, Children's Hospital of Philadelphia, Philadelphia, PA 19104, USA.
  • Hui-Qi Qu
    Center for Applied Genomics, Children's Hospital of Philadelphia, Philadelphia, PA 19104, USA.
  • Frank D Mentch
    Center for Applied Genomics, Children's Hospital of Philadelphia, Philadelphia, PA, 19104, USA.
  • Jingchun Qu
    Center for Applied Genomics, Children's Hospital of Philadelphia, Philadelphia, PA 19104, USA.
  • Xiao Chang
    Department of Radiation Oncology, School of Medicine, Washington University in Saint Louis, St.Louis, MO, 63110, USA.
  • Kenny Nguyen
    Center for Applied Genomics, Children's Hospital of Philadelphia, Philadelphia, PA 19104, USA.
  • Lifeng Tian
    Center for Applied Genomics, Children's Hospital of Philadelphia, Philadelphia, PA 19104, USA.
  • Joseph Glessner
    Center for Applied Genomics, Children's Hospital of Philadelphia, Philadelphia, PA 19104, USA.
  • Patrick M A Sleiman
    Center for Applied Genomics, Children's Hospital of Philadelphia, Philadelphia, PA 19104, USA.
  • Hakon Hakonarson
    The Center for Applied Genomics, The Children's Hospital of Philadelphia, Philadelphia, Pennsylvania, USA.