SENet: A deep learning framework for discriminating super- and typical enhancers by sequence information.

Journal: Computational biology and chemistry
PMID:

Abstract

Super-enhancers are large domains on the genome where multiple short typical enhancers within a specific genomic distance are stitched together. Typically, they are cell type-specific and responsible for defining cell identity and regulating gene transcription. Numerous studies have demonstrated that super-enhancers are enriched for trait-associated variants, and mutations in super-enhancers are possibly related to known diseases. Recently, several machine learning-based methods have been used to distinguish super-enhancers from typical enhancers by using high-throughput data from various experimental methods. The acquisition of such experimental data is usually costly and time-consuming. In this paper, we innovatively proposed SENet, a groundbreaking method based on a deep neural network model, for discriminating between the two categories solely utilizing sequence information. SENet employs dna2vec feature embedding, convolution for local feature extraction, attention pooling for refined feature retention, and Transformer for contextual information extraction. Experiments demonstrate that SENet outperforms all current state-of-the-art computational methods and shows satisfactory performance in cross-species validation. Our method pioneers the distinction between super-enhancers and typical ones using only sequence information. The source code and datasets are stored in https://github.com/lhy0322/SENet.

Authors

  • Hanyu Luo
    Department of Cardiology of Lu'an People's Hospital, Lu'an Hospital of Anhui Medical University, Lu'an, China.
  • Ye Li
    Environment and Plant Protection Institute, Chinese Academy of Tropical Agricultural Science, Haikou 571010, People's Republic of China; Key Laboratory of Monitoring and Control of Tropical Agricultural and Forest Invasive Alien Pests, Ministry of Agriculture, Haikou 571010, People's Republic of China.
  • Huan Liu
    Department of Chemical and Biochemical Engineering, College of Chemistry and Chemical Engineering, Xiamen University, Xiamen 361005, Fujian, China.
  • Pingjian Ding
    Center for Artificial Intelligence in Drug Discovery, School of Medicine, Case Western Reserve University, Cleveland, OH, United States.
  • Ying Yu
    School of Chemistry and Environment, Guangzhou Key Laboratory of Analytical Chemistry for Biomedicine, South China Normal University, Guangzhou 510006, PR China. Electronic address: yuyhs@scnu.edu.cn.
  • Lingyun Luo
    School of Computer Sciences, University of South China, Hengyang 421001, China. Electronic address: luoly@usc.edu.cn.