A supervised learning framework for chromatin loop detection in genome-wide contact maps.

Journal: Nature communications
Published Date:

Abstract

Accurately predicting chromatin loops from genome-wide interaction matrices such as Hi-C data is critical to deepening our understanding of proper gene regulation. Current approaches are mainly focused on searching for statistically enriched dots on a genome-wide map. However, given the availability of orthogonal data types such as ChIA-PET, HiChIP, Capture Hi-C, and high-throughput imaging, a supervised learning approach could facilitate the discovery of a comprehensive set of chromatin interactions. Here, we present Peakachu, a Random Forest classification framework that predicts chromatin loops from genome-wide contact maps. We compare Peakachu with current enrichment-based approaches, and find that Peakachu identifies a unique set of short-range interactions. We show that our models perform well in different platforms, across different sequencing depths, and across different species. We apply this framework to predict chromatin loops in 56 Hi-C datasets, and release the results at the 3D Genome Browser.

Authors

  • Tarik J Salameh
    Bioinformatics and Genomics Program, The Pennsylvania State University, University Park, State College, PA, 16802, USA.
  • Xiaotao Wang
    Department of Biochemistry and Molecular Genetics, Northwestern University Feinberg School of Medicine, Chicago, IL, 60611, USA. xiaotao.wang@northwestern.edu.
  • Fan Song
    Bioinformatics and Genomics Program, The Pennsylvania State University, University Park, State College, PA, 16802, USA.
  • Bo Zhang
    Department of Clinical Pharmacology, Key Laboratory of Clinical Cancer Pharmacology and Toxicology Research of Zhejiang Province, Affiliated Hangzhou First People's Hospital, Zhejiang University School of Medicine, Hangzhou, Zhejiang 310006, PR China.
  • Sage M Wright
    Bioinformatics and Genomics Program, The Pennsylvania State University, University Park, State College, PA, 16802, USA.
  • Chachrit Khunsriraksakul
    Bioinformatics and Genomics Program, The Pennsylvania State University, University Park, State College, PA, 16802, USA.
  • Yijun Ruan
    The Jackson Laboratory for Genomic Medicine, Farmington, CT, 06032, USA.
  • Feng Yue
    Bioinformatics and Genomics Program, Huck Institutes of the Life Sciences, The Pennsylvania State University, University Park, PA, 16802, USA. fyue@hmc.psu.edu.