DECODE: a Deep-learning framework for Condensing enhancers and refining boundaries with large-scale functional assays.

Journal: Bioinformatics (Oxford, England)
PMID:

Abstract

MOTIVATION: Mapping distal regulatory elements, such as enhancers, is a cornerstone for elucidating how genetic variations may influence diseases. Previous enhancer-prediction methods have used either unsupervised approaches or supervised methods with limited training data. Moreover, past approaches have implemented enhancer discovery as a binary classification problem without accurate boundary detection, producing low-resolution annotations with superfluous regions and reducing the statistical power for downstream analyses (e.g. causal variant mapping and functional validations). Here, we addressed these challenges via a two-step model called Deep-learning framework for Condensing enhancers and refining boundaries with large-scale functional assays (DECODE). First, we employed direct enhancer-activity readouts from novel functional characterization assays, such as STARR-seq, to train a deep neural network for accurate cell-type-specific enhancer prediction. Second, to improve the annotation resolution, we implemented a weakly supervised object detection framework for enhancer localization with precise boundary detection (to a 10 bp resolution) using Gradient-weighted Class Activation Mapping.

Authors

  • Zhanlin Chen
    Department of Statistics & Data Science, Yale University, New Haven, CT 06520, USA.
  • Jing Zhang
    MOEMIL Laboratory, School of Optoelectronic Information, University of Electronic Science and Technology of China, Chengdu, China.
  • Jason Liu
    Program in Computational Biology and Bioinformatics, Yale University, New Haven, Connecticut, United States of America.
  • Yi Dai
    Department of Computer Science, University of California, Irvine, CA 92617, USA.
  • Donghoon Lee
    Department of Radiation Convergence Engineering, Research Institute of Health Science, Yonsei Univeristy, 1 Yonseidae-gil, Wonju, Gangwon, 26493, Korea.
  • Martin Renqiang Min
    Department of Machine Learning, NEC Laboratories America, Princeton, NJ 08540, USA.
  • Min Xu
    Department of Gastroenterology, Shanghai First People's Hospital, School of Medicine, Shanghai Jiao Tong University, Shanghai, People's Republic of China.
  • Mark Gerstein
    Program of Computational Biology and Bioinformatics and Department of Molecular Biophysics and Biochemistry and Department of Computer Science, Yale University, New Haven, CT 06511, USA.