Deep Learning Sequence Models for Transcriptional Regulation.

Journal: Annual review of genomics and human genetics
PMID:

Abstract

Deciphering the regulatory code of gene expression and interpreting the transcriptional effects of genome variation are critical challenges in human genetics. Modern experimental technologies have resulted in an abundance of data, enabling the development of sequence-based deep learning models that link patterns embedded in DNA to the biochemical and regulatory properties contributing to transcriptional regulation, including modeling epigenetic marks, 3D genome organization, and gene expression, with tissue and cell-type specificity. Such methods can predict the functional consequences of any noncoding variant in the human genome, even rare or never-before-observed variants, and systematically characterize their consequences beyond what is tractable from experiments or quantitative genetics studies alone. Recently, the development and application of interpretability approaches have led to the identification of key sequence patterns contributing to the predicted tasks, providing insights into the underlying biological mechanisms learned and revealing opportunities for improvement in future models.

Authors

  • Ksenia Sokolova
    Department of Computer Science and Lewis-Sigler Institute for Integrative Genomics, Princeton University, Princeton, New Jersey, USA; email: sokolova@princeton.edu, kc31@princeton.edu, ogt@cs.princeton.edu.
  • Kathleen M Chen
    Department of Systems Pharmacology and Translational Therapeutics, University of Pennsylvania, Philadelphia, PA, USA.
  • Yun Hao
    Flatiron Institute, Simons Foundation, New York, NY, USA; email: yhao@flatironinstitute.org.
  • Jian Zhou
    CTIQ, Canon Medical Research USA, Inc., Vernon Hills, 60061, USA.
  • Olga G Troyanskaya
    Lewis-Sigler Institute for Integrative Genomics, Princeton University, Princeton, NJ, USA. ogt@cs.princeton.edu.