Foundations of Machine Learning-Based Clinical Prediction Modeling: Part V-A Practical Approach to Regression Problems.

Journal: Acta neurochirurgica. Supplement
Published Date:

Abstract

This chapter goes through the steps required to train and validate a simple, machine learning-based clinical prediction model for any continuous outcome. We supply fully structured code for the readers to download and execute in parallel to this section, as well as a simulated database of 10,000 glioblastoma patients who underwent microsurgery, and predict survival from diagnosis in months. We walk the reader through each step, including import, checking, splitting of data. In terms of pre-processing, we focus on how to practically implement imputation using a k-nearest neighbor algorithm. We also illustrate how to select features based on recursive feature elimination and how to use k-fold cross validation. We demonstrate a generalized linear model, a generalized additive model, a random forest, a ridge regressor, and a Least Absolute Shrinkage and Selection Operator (LASSO) regressor. Specifically for regression, we discuss how to evaluate root mean square error (RMSE), mean average error (MAE), and the R statistic, as well as how a quantile-quantile plot can be used to assess the performance of the regressor along the spectrum of the outcome variable, similarly to calibration when dealing with binary outcomes. Finally, we explain how to arrive at a measure of variable importance using a universal, nonparametric method.

Authors

  • Victor E Staartjes
    Department of Neurosurgery, Bergman Clinics, Naarden, The Netherlands; and.
  • Julius M Kernbach
    Department of Neurosurgery, RWTH Aachen University Hospital, Aachen, Germany.