Characterization and machine learning prediction of allele-specific DNA methylation.

Journal: Genomics
Published Date:

Abstract

A large collection of Single Nucleotide Polymorphisms (SNPs) has been identified in the human genome. Currently, the epigenetic influences of SNPs on their neighboring CpG sites remain elusive. A growing body of evidence suggests that locus-specific information, including genomic features and local epigenetic state, may play important roles in the epigenetic readout of SNPs. In this study, we made use of mouse methylomes with known SNPs to develop statistical models for the prediction of SNP associated allele-specific DNA methylation (ASM). ASM has been classified into parent-of-origin dependent ASM (P-ASM) and sequence-dependent ASM (S-ASM), which comprises scattered-S-ASM (sS-ASM) and clustered-S-ASM (cS-ASM). We found that P-ASM and cS-ASM CpG sites are both enriched in CpG rich regions, promoters and exons, while sS-ASM CpG sites are enriched in simple repeat and regions with high frequent SNP occurrence. Using Lasso-grouped Logistic Regression (LGLR), we selected 21 out of 282 genomic and methylation related features that are powerful in distinguishing cS-ASM CpG sites and trained the classifiers with machine learning techniques. Based on 5-fold cross-validation, the logistic regression classifier was found to be the best for cS-ASM prediction with an ACC of 0.77, an AUC of 0.84 and an MCC of 0.54. Lastly, we applied the logistic regression classifier on human brain methylome and predicted 608 genes associated with cS-ASM. Gene ontology term enrichment analysis indicated that these cS-ASM associated genes are significantly enriched in the category coding for transcripts with alternative splicing forms. In summary, this study provided an analytical procedure for cS-ASM prediction and shed new light on the understanding of different types of ASM events.

Authors

  • Jianlin He
    Department of Pharmacology, School of Pharmacy, Nanjing University of Chinese Medicine, Nanjing, 210029, People's Republic of China.
  • Ming-an Sun
    Epigenomics and Computational Biology Lab, Virginia Bioinformatics Institute, Virginia Tech, VA 24060, USA. Electronic address: mingansun@gmail.com.
  • Zhong Wang
    Department of Intensive Care Unit, The First Hospital of China Medical University, Shenyang, Liaoning, China.
  • Qianfei Wang
    Laboratory of Genome Variation and Precision Biomedicine, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing 100101, China. Electronic address: jefferyqfwang@gmail.com.
  • Qing Li
    Department of Internal Medicine, University of Michigan Ann Arbor, MI 48109, USA.
  • Hehuang Xie
    Laboratory of Genome Variation and Precision Biomedicine, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing 100101, China; Epigenomics and Computational Biology Lab, Virginia Bioinformatics Institute, Virginia Tech, VA 24060, USA; Department of Biological Sciences, Virginia Tech, VA 24060, USA. Electronic address: davidxie@vt.edu.