Effective Cancer Subtype and Stage Prediction via Dropfeature-DNNs.

Journal: IEEE/ACM transactions on computational biology and bioinformatics
Published Date:

Abstract

Precise cancer subtype and/or stage prediction is instrumental for cancer diagnosis, treatment and management. However, most of the existing methods based on genomic profiles suffer from issues such as overfitting, high computational complexity and selected features (i.e., genes) not directly related to forecast precision. These deficiencies are largely due to the nature of "high dimensionality and small sample size" inherent in molecular data, and such a nature is often deemed as an obstacle to the application of deep learning, e.g., deep neural networks (DNNs), to biomedicine and cancer research. In this paper, we propose a DNN-based algorithm coupled with a new embedded feature selection technique, named Dropfeature-DNNs, to address these issues. Dropfeature-DNNs can discard some irrelevant features (i.e., genes) when training DNNs, and we formulate Dropfeature-DNNs as an iterative AUC optimization problem. As such, an "optimal" feature subset that contains meaningful genes for accurate tumor subtype and/or stage prediction can be obtained when the AUC optimization converges in the training stage. Since the feature subset and AUC optimizations are synchronous with the training phase of DNNs, model complexity and computational cost are simultaneously reduced. Rigorous feature subset convergence analysis and error bound inference provide a solid theoretical foundation for the proposed method. Extensive empirical comparisons to benchmark methods further demonstrate the efficacy of Dropfeature-DNNs in cancer subtype and/or stage prediction using HDSS gene expression data from multiple cancer types.

Authors

  • Zhong Chen
    Institute of HIV/AIDS The First Hospital of Changsha, Changsha, China.
  • Wensheng Zhang
    Department of Anesthesiology, West China Hospital, Sichuan University, Chengdu, China.
  • Hongwen Deng
    Department of Global Biostatistics and Data Science, Center for Bioinformatics and Genomics, School of Public Health and Tropical Medicine, Tulane University, New Orleans, Los Angeles.
  • Kun Zhang
    Philosophy Department, Carnegie Mellon University, Pittsburgh, Pennsylvania, United States of America.