DriverML: a machine learning algorithm for identifying driver genes in cancer sequencing studies.

Journal: Nucleic acids research
Published Date:

Abstract

Although rapid progress has been made in computational approaches for prioritizing cancer driver genes, research is far from achieving the ultimate goal of discovering a complete catalog of genes truly associated with cancer. Driver gene lists predicted from these computational tools lack consistency and are prone to false positives. Here, we developed an approach (DriverML) integrating Rao's score test and supervised machine learning to identify cancer driver genes. The weight parameters in the score statistics quantified the functional impacts of mutations on the protein. To obtain optimized weight parameters, the score statistics of prior driver genes were maximized on pan-cancer training data. We conducted rigorous and unbiased benchmark analysis and comparisons of DriverML with 20 other existing tools in 31 independent datasets from The Cancer Genome Atlas (TCGA). Our comprehensive evaluations demonstrated that DriverML was robust and powerful among various datasets and outperformed the other tools with a better balance of precision and sensitivity. In vitro cell-based assays further proved the validity of the DriverML prediction of novel driver genes. In summary, DriverML uses an innovative, machine learning-based approach to prioritize cancer driver genes and provides dramatic improvements over currently existing methods. Its source code is available at https://github.com/HelloYiHan/DriverML.

Authors

  • Yi Han
    Department of Anesthesiology, the Second Hospital of Shanxi Medical University, Taiyuan 030001, Shanxi, China. Corresponding author: Han Yi, Email: 13753171979@163.com.
  • Juze Yang
    Sir Run Run Shaw Hospital and Institute of Translational Medicine, Zhejiang University School of Medicine, Hangzhou, Zhejiang 310016, China.
  • Xinyi Qian
    Sir Run Run Shaw Hospital and Institute of Translational Medicine, Zhejiang University School of Medicine, Hangzhou, Zhejiang 310016, China.
  • Wei-Chung Cheng
    Graduate Institute of Biomedical Sciences, Research Center for Tumor Medical Science, and Drug Development Center, China Medical University, Taichung 40402, Taiwan.
  • Shu-Hsuan Liu
    Graduate Institute of Biomedical Sciences, Research Center for Tumor Medical Science, and Drug Development Center, China Medical University, Taichung 40402, Taiwan.
  • Xing Hua
    Division of Cancer Epidemiology and Genetics, National Cancer Institute, NIH, Bethesda, MD 20892, USA.
  • Liyuan Zhou
    Sir Run Run Shaw Hospital and Institute of Translational Medicine, Zhejiang University School of Medicine, Hangzhou, Zhejiang 310016, China.
  • Yaning Yang
    Department of Statistics and Finance, University of Science and Technology of China, Hefei, Anhui 230026, China.
  • Qingbiao Wu
    School of Mathematical Sciences, Zhejiang University, HangZhou, Zhejiang, China.
  • Pengyuan Liu
    Center of Systems Molecular Medicine, Department of Physiology, Medical College of Wisconsin , Milwaukee, Wisconsin.
  • Yan Lu
    National Institute of Standards and Technology.