DeepMethyGene: a deep-learning model to predict gene expression using DNA methylations.

Journal: BMC bioinformatics
PMID:

Abstract

Gene expression is the basis for cells to achieve various functions, while DNA methylation constitutes a critical epigenetic mechanism governing gene expression regulation. Here we propose DeepMethyGene, an adaptive recursive convolutional neural network model based on ResNet that predicts gene expression using DNA methylation information. Our model transforms methylation Beta values to M values for Gaussian distributed data optimization, dynamically adjusts the output channels according to input dimension, and implements residual blocks to mitigate the problem of gradient vanishing when training very deep networks. Benchmarking against the state-of-the-art geneEXPLORE model (R = 0.449), DeepMethyGene (R = 0.640) demonstrated superior predictive performance. Further analysis revealed that the number of methylation sites and the average distance between these sites and gene transcription start sites (TSS) significantly affected the prediction accuracy. By exploring the complex relationship between methylation and gene expression, this study provides theoretical support for disease progression prediction and clinical intervention. Relevant data and code are available at https://github.com/yaoyao-11/DeepMethyGene .

Authors

  • Yuyao Yan
    Department of Intelligent Science, Xi'an Jiaotong-Liverpool University, China.
  • Xinyi Chai
    CAS Key Laboratory of Computational Biology, Shanghai Institute of Nutrition and Health, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Shanghai, China.
  • Jiajun Liu
    School of Computer Science and Engineering, Southeast University, Nanjing 210018, China.
  • Sijia Wang
    CAS Key Laboratory of Computational Biology, Shanghai Institute of Nutrition and Health, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Shanghai, 200031, China.
  • Wenran Li
    Ministry of Education Key Laboratory of Bioinformatics; Bioinformatics Division, Department of Automation and Tsinghua National Laboratory for Information Science and Technology, Tsinghua University, Beijing, 100084, China.
  • Tao Huang
    The Second Clinical Medical College of Guangzhou University of Chinese Medicine, Guangzhou, China.