Gene pathogenicity prediction of Mendelian diseases via the random forest algorithm.

Journal: Human genetics
Published Date:

Abstract

The study of Mendelian diseases and the identification of their causative genes are of great significance in the field of genetics. The evaluation of the pathogenicity of genes and the total number of Mendelian disease genes are both important questions worth studying. However, very few studies have addressed these issues to date, so we attempt to answer them in this study. We calculated the gene pathogenicity prediction (GPP) score by a machine learning approach (random forest algorithm) to evaluate the pathogenicity of genes. When we applied the GPP score to the testing gene set, we obtained an accuracy of 80%, recall of 93% and area under the curve of 0.87. Our results estimated that a total of 10,384 protein-coding genes were Mendelian disease genes. Furthermore, we found the GPP score was positively correlated with the severity of disease. Our results indicate that GPP score may provide a robust and reliable guideline to predict the pathogenicity of protein-coding genes. To our knowledge, this is the first trial to estimate the total number of Mendelian disease genes.

Authors

  • Sijie He
    BGI Education Center, University of Chinese Academy of Sciences, Shenzhen, 518083, China.
  • Weiwei Chen
    Department of Developmental and Behavioral Pediatrics, Shanghai Children's Medical Center affiliated to Shanghai Jiaotong University School of Medicine, Ministry of Education-Shanghai Key Laboratory of Children's Environmental Health, Shanghai, China.
  • Hankui Liu
    BGI-Shenzhen, Shenzhen, 518083, China.
  • Shengting Li
    BGI-Shenzhen, Shenzhen, Guangdong, 518083, China.
  • Dongzhu Lei
    Center of Prenatal Diagnosis, ChenZhou No. 1 People's Hospital, Hunan, 423000, China.
  • Xiao Dang
    BGI-Shenzhen, Shenzhen, 518083, China.
  • Yulan Chen
    BGI-Shenzhen, Shenzhen, 518083, China.
  • Xiuqing Zhang
    BGI-Shenzhen, Shenzhen, Guangdong, 518083, China.
  • Jianguo Zhang
    College of Automation, Harbin Engineering University, No. 145, Nantong street, Harbin, China.