Gene expression classification using epigenetic features and DNA sequence composition in the human embryonic stem cell line H1.

Journal: Gene
PMID:

Abstract

Epigenetic factors are known to correlate with gene expression in the existing studies. However, quantitative models that accurately classify the highly and lowly expressed genes based on epigenetic factors are currently lacking. In this study, a new machine learning method combines histone modifications, DNA methylation, DNA accessibility, transcription factors, and trinucleotide composition with support vector machines (SVM) is developed in the context of human embryonic stem cell line (H1). The results indicate that the predictive accuracy will be markedly improved when the epigenetic features are considered. The predictive accuracy and Matthews correlation coefficient of the best model are as high as 95.96% and 0.92 for 10-fold cross-validation test, and 95.58% and 0.92 for independent dataset test, respectively. Our model provides a good way to judge a gene is either highly or lowly expressed gene by using genetic and epigenetic data, when the expression data of the gene is lacking. And a web-server GECES for our analysis method is established at http://202.207.14.87:8032/fuwu/GECES/index.asp, so that other scientists can easily get their desired results by our web-server, without going through the mathematical details.

Authors

  • Wen-Xia Su
    Laboratory of Theoretical Biophysics, School of Physical Science and Technology, Inner Mongolia University, Hohhot 010021, China.
  • Qian-Zhong Li
    Laboratory of Theoretical Biophysics, School of Physical Science and Technology, Inner Mongolia University, Hohhot 010021, China. Electronic address: qzli@imu.edu.cn.
  • Lu-Qiang Zhang
    Laboratory of Theoretical Biophysics, School of Physical Science and Technology, Inner Mongolia University, Hohhot 010021, China.
  • Guo-Liang Fan
    Laboratory of Theoretical Biophysics, School of Physical Science and Technology, Inner Mongolia University, Hohhot 010021, China.
  • Cheng-Yan Wu
    Laboratory of Theoretical Biophysics, School of Physical Science and Technology, Inner Mongolia University, Hohhot 010021, China.
  • Zhen-He Yan
    Laboratory of Theoretical Biophysics, School of Physical Science and Technology, Inner Mongolia University, Hohhot 010021, China.
  • Yong-Chun Zuo
    The Key Laboratory of Mammalian Reproductive Biology and Biotechnology of the Ministry of Education, College of life sciences, Inner Mongolia University, Hohhot, 010021, China.