Avocado: a multi-scale deep tensor factorization method learns a latent representation of the human epigenome.

Journal: Genome biology
PMID:

Abstract

The human epigenome has been experimentally characterized by thousands of measurements for every basepair in the human genome. We propose a deep neural network tensor factorization method, Avocado, that compresses this epigenomic data into a dense, information-rich representation. We use this learned representation to impute epigenomic data more accurately than previous methods, and we show that machine learning models that exploit this representation outperform those trained directly on epigenomic data on a variety of genomics tasks. These tasks include predicting gene expression, promoter-enhancer interactions, replication timing, and an element of 3D chromatin architecture.

Authors

  • Jacob Schreiber
    Paul G. Allen School of Computer Science and Engineering, University of Washington, Seattle, USA.
  • Timothy Durham
    Department of Genome Sciences, University of Washington, Seattle, USA.
  • Jeffrey Bilmes
    Paul G. Allen School of Computer Science and Engineering, University of Washington, Seattle, USA.
  • William Stafford Noble
    1] Department of Computer Science and Engineering, University of Washington, 185 Stevens Way, Seattle, Washington 98195-2350, USA. [2] Department of Genome Sciences, University of Washington, 3720 15th Ave NE Seattle, Washington 98195-5065, USA.