CREATE: cell-type-specific cis-regulatory element identification via discrete embedding.

Journal: Nature communications
PMID:

Abstract

Cis-regulatory elements (CREs), including enhancers, silencers, promoters and insulators, play pivotal roles in orchestrating gene regulatory mechanisms that drive complex biological traits. However, current approaches for CRE identification are predominantly sequence-based and typically focus on individual CRE types, limiting insights into their cell-type-specific functions and regulatory dynamics. Here, we present CREATE, a multimodal deep learning framework based on Vector Quantized Variational AutoEncoder, tailored for comprehensive CRE identification and characterization. CREATE integrates genomic sequences, chromatin accessibility, and chromatin interaction data to generate discrete CRE embeddings, enabling accurate multi-class classification and robust characterization of CREs. CREATE excels in identifying cell-type-specific CREs, and provides quantitative and interpretable insights into CRE-specific features, uncovering the underlying regulatory codes. By facilitating large-scale prediction of CREs in specific cell types, CREATE enhances the recognition of disease- or phenotype-associated biological variabilities of CREs, thus advancing our understanding of gene regulatory landscapes and their roles in health and disease.

Authors

  • Xuejian Cui
    Ministry of Education Key Laboratory of Bioinformatics, Research Department of Bioinformatics at the Beijing National Research Center for Information Science and Technology, Center for Synthetic and Systems Biology, Department of Automation, Tsinghua University, Beijing 100084, China.
  • Qijin Yin
    MOE Key Laboratory of Bioinformatics; Bioinformatics Division and Center for Synthetic & Systems Biology, TNLIST; Department of Automation, Tsinghua University, Beijing 100084, China.
  • Zijing Gao
    Ministry of Education Key Laboratory of Bioinformatics, Research Department of Bioinformatics at the Beijing National Research Center for Information Science and Technology, Center for Synthetic and Systems Biology, Department of Automation, Tsinghua University, Beijing 100084, China.
  • Zhen Li
    PepsiCo R&D, Valhalla, NY, United States.
  • Xiaoyang Chen
    Department of Pulmonary and Critical Care Medicine, The Second Hospital of Fujian Medical University, Quanzhou, Fujian Province, China.
  • Hairong Lv
    Department of Automation, Tsinghua University, Beijing, China; MOE Key Laboratory of Bioinformatics, Bioinformatics Division and Center for Synthetic and Systems Biology, TNLIST, China. Electronic address: lvhairong@tsinghua.edu.cn.
  • Shengquan Chen
    MOE Key Laboratory of Bioinformatics and Bioinformatics Division, TNLIST, Beijing, 100084, China.
  • Qiao Liu
    MOE Key Laboratory of Bioinformatics; Bioinformatics Division and Center for Synthetic & Systems Biology, TNLIST; Department of Automation, Tsinghua University, Beijing 100084, China.
  • Wanwen Zeng
    MOE Key Laboratory of Bioinformatics and Bioinformatics Division, TNLIST, Tsinghua University, Beijing, China.
  • Rui Jiang
    Department of Urology, The Affiliated Hospital of Southwest Medical University, Luzhou, China.