How to use learning curves to evaluate the sample size for malaria prediction models developed using machine learning algorithms.

Journal: Malaria journal
Published Date:

Abstract

BACKGROUND: Machine learning algorithms have been used to predict malaria risk and severity, identify immunity biomarkers for malaria vaccine candidates, and determine molecular biomarkers of antimalarial drug resistance. Developing these prediction models requires large training datasets to ensure prediction accuracy when applied to new individuals in the target population. Learning curves can be used to assess the sample size required for the training dataset by evaluating the predictive performance of a model trained using different dataset sizes. These curves are agnostic to the specific prediction model, but their construction does require existing data. This tutorial demonstrates how to generate and interpret learning curves for malaria prediction models developed using machine learning algorithms.

Authors

  • Sophie G Zaloumis
    Centre for Epidemiology and Biostatistics, Melbourne School of Population and Global Health, University of Melbourne, Carlton, VIC, Australia. sophiez@unimelb.edu.au.
  • Megha Rajasekhar
    Centre for Epidemiology and Biostatistics, Melbourne School of Population and Global Health, University of Melbourne, Carlton, VIC, Australia.
  • Julie A Simpson
    Centre for Epidemiology and Biostatistics, Melbourne School of Population and Global Health, University of Melbourne, Parkville, VIC, Australia.