ASiDentify (ASiD): a machine learning model to predict new autism spectrum disorder risk genes.
Journal:
Genetics
PMID:
40088463
Abstract
Autism spectrum disorder (ASD) is a neurodevelopmental disorder that affects nearly 3% of children and has a strong genetic component. While hundreds of ASD risk genes have been identified through sequencing studies, the genetic heterogeneity of ASD makes identifying additional risk genes using these methods challenging. To predict candidate ASD risk genes, we developed a simple machine learning model, ASiDentify (ASiD), using human genomic, RNA- and protein-based features. ASiD identified over 1,300 candidate ASD risk genes, over 300 of which have not been previously predicted. ASiD made accurate predictions of ASD risk genes using 6 features predictive of ASD risk gene status, including mutational constraint, synapse localization and gene expression in neurons, astrocytes and non-brain tissues. Particular functional groups of proteins found to be strongly implicated in ASD include RNA-binding proteins (RBPs) and chromatin regulators. We constructed additional logistic regression models to make predictions and assess informative features specific to RBPs, including mutational constraint, or chromatin regulators, for which both expression level in excitatory neurons and mutational constraint were informative. The fact that RBPs and chromatin regulators had informative features distinct from all protein-coding genes suggests that specific biological pathways connect risk genes with different molecular functions to ASD.