Empirical analyses and simulations showed that different machine and statistical learning methods had differing performance for predicting blood pressure.

Journal: Scientific reports
Published Date:

Abstract

Machine learning is increasingly being used to predict clinical outcomes. Most comparisons of different methods have been based on empirical analyses in specific datasets. We used Monte Carlo simulations to determine when machine learning methods perform better than statistical learning methods in a specific setting. We evaluated six learning methods: stochastic gradient boosting machines using trees as the base learners, random forests, artificial neural networks, the lasso, ridge regression, and linear regression estimated using ordinary least squares (OLS). Our simulations were informed by empirical analyses in patients with acute myocardial infarction (AMI) and congestive heart failure (CHF) and used six data-generating processes, each based on one of the six learning methods, to simulate continuous outcomes in the derivation and validation samples. The outcome was systolic blood pressure at hospital discharge, a continuous outcome. We applied the six learning methods in each of the simulated derivation samples and evaluated performance in the simulated validation samples. The primary observation was that neural networks tended to result in estimates with worse predictive accuracy than the other five methods in both disease samples and across all six data-generating processes. Boosted trees and OLS regression tended to perform well across a range of scenarios.

Authors

  • Peter C Austin
    Institute for Clinical Evaluative Service (ICES), Toronto, Ontario, Canada; Institute of Health Policy, Management and Evaluation, University of Toronto, Toronto, Ontario, Canada.
  • Frank E Harrell
    Department of Biostatistics, Vanderbilt University School of Medicine, Nashville, TN, USA.
  • Douglas S Lee
    Ted Rogers Centre for Heart Research, Toronto, Ontario, Canada; Peter Munk Cardiac Centre, University Health Network, Toronto, Ontario, Canada; Institute for Clinical Evaluative Sciences, Toronto, Ontario, Canada; Institute for Health Policy, Management and Evaluation, Toronto, Ontario, Canada; Toronto General Hospital Research Institute, Toronto, Ontario, Canada; University of Toronto, Toronto, Ontario, Canada. Electronic address: dlee@ices.on.ca.
  • Ewout W Steyerberg
    Department of Biomedical Data Sciences, Leiden University Medical Centre, Albinusdreef 2, Leiden, 2333 ZA The Netherlands.