Statistical and Machine Learning Methods for Discovering Prognostic Biomarkers for Survival Outcomes.

Journal: Methods in molecular biology (Clifton, N.J.)
Published Date:

Abstract

Discovering molecular biomarkers for predicting patient survival outcomes is an essential step toward improving prognosis and therapeutic decision-making in the treatment of severe diseases such as cancer. Due to the high-dimensionality nature of omics datasets, statistical methods such as the least absolute shrinkage and selection operator (Lasso) have been widely applied for cancer biomarker discovery. Due to their scalability and demonstrated prediction performance, machine learning methods such as XGBoost and neural network models have also been gaining popularity in the community recently. However, compared to more traditional survival methods such as Kaplan-Meier and Cox regression methods, high-dimensional methods for survival outcomes are still less well known to biomedical researchers. In this chapter, we will discuss the key analytical procedures in employing these methods for identifying biomarkers associated with survival data. We will also identify important considerations that emerged from the analysis of actual omics data. Some typical instances of misapplication and misinterpretation of machine learning methods will also be discussed. Using lung cancer and head and neck cancer datasets as demonstrations, we provide step-by-step instructions and sample R codes for prioritizing prognostic biomarkers.

Authors

  • SiJie Yao
    National Engineering Research Center for Agro-Ecological Big Data Analysis and Application, School of Internet and Institutes of Physical Science and Information Technology, Anhui University, Hefei, 230601, Anhui, China.
  • Xuefeng Wang
    Department of Advanced Manufacturing and Robotics, College of Engineering, Peking University, Beijing 100871, China.