Personal transcriptome variation is poorly explained by current genomic deep learning models.

Journal: Nature genetics
Published Date:

Abstract

Genomic deep learning models can predict genome-wide epigenetic features and gene expression levels directly from DNA sequence. While current models perform well at predicting gene expression levels across genes in different cell types from the reference genome, their ability to explain expression variation between individuals due to cis-regulatory genetic variants remains largely unexplored. Here, we evaluate four state-of-the-art models on paired personal genome and transcriptome data and find limited performance when explaining variation in expression across individuals. In addition, models often fail to predict the correct direction of effect of cis-regulatory genetic variation on expression.

Authors

  • Connie Huang
    Department of Electrical Engineering and Computer Sciences, University of California Berkeley, Berkeley, CA, USA.
  • Richard W Shuai
    Department of Electrical Engineering and Computer Sciences, University of California Berkeley, Berkeley, CA, USA.
  • Parth Baokar
    Department of Electrical Engineering and Computer Sciences, University of California Berkeley, Berkeley, CA, USA.
  • Ryan Chung
    Department of Radiology, NYU Langone Health, New York, New York.
  • Ruchir Rastogi
    Computer Science Division, University of California, Berkeley, 94720, CA, USA.
  • Pooja Kathail
    Center for Computational Biology, University of California Berkeley, Berkeley, CA, USA.
  • Nilah M Ioannidis
    Department of Electrical Engineering and Computer Sciences, University of California Berkeley, Berkeley, CA, USA. nilah@berkeley.edu.