Handling Class Imbalance in Machine Learning-based Prediction Models: A Case Study in Asthma Management.

Journal: Annual International Conference of the IEEE Engineering in Medicine and Biology Society. IEEE Engineering in Medicine and Biology Society. Annual International Conference
Published Date:

Abstract

A data-driven prediction tool has the potential to provide early warning of an asthma attack and improve asthma management and outcomes. Most previous machine learning (ML)-based studies for asthma attack prediction have reported a severe class imbalance, with major implications for model performance. We aimed to undertake a systematic comparison of several class imbalance handling techniques in the context of risk prediction models for asthma prognosis. We used data from 9,835 asthma patients extracted from the Medical Information Mart for Intensive Care (MIMIC) IV database and deployed five class imbalance handling methods based on synthetic minority oversampling technique (SMOTE) and cost function customisation. We then compared their performances in improving two-class classifier models developed using logistic regression (LR) and extreme gradient boosting (XGBoost) for three different prediction tasks with varying severity of class imbalance (proportion of majority class ranging from 90.86% to 98.98%). The cost function customisation technique substantially outperformed the SMOTE-based methods in all tasks. XGBoost combined with cost function customisation achieved the highest prediction performance for the outcome with the most extreme class imbalance ratio (AUC = 0.72). Our findings suggest that the cost function customisation-based approach to tackle class imbalance provides substantially better performance compared to oversampling in the context of asthma management.Clinical Relevance- This study underscores the challenge of class imbalance in the context of prediction tools to improve asthma management and outcomes and provides a methodological solution that addresses the challenge. Accurate asthma prediction tools can provide early warning and potentially prevent deterioration thereby improving the quality of life of patients with asthma.

Authors

  • Arif Budiarto
    Asthma UK Center for Applied Research, Usher Institute, University of Edinburgh, Edinburgh, United Kingdom.
  • Aziz Sheikh
    Asthma UK Center for Applied Research, Usher Institute, University of Edinburgh, Edinburgh, United Kingdom.
  • Andrew Wilson
    Jenny Alderden is an assistant professor, School of Nursing, Boise State University, Boise, Idaho, and an adjunct assistant professor, College of Nursing, University of Utah, Salt Lake City, Utah. Ginette Alyce Pepper is a professor, and Andrew Wilson is a clinical assistant professor, College of Nursing, University of Utah. Joanne D. Whitney is a professor, College of Nursing, University of Washington, Seattle, Washington. Stephanie Richardson is a professor, Rocky Mountain University of the Health Professions, Provo, Utah. Ryan Butcher is a senior data architect, Biomedical Informatics Team, Center for Clinical and Translational Science, University of Utah. Yeonjung Jo is a doctoral (PhD) student in population health science, College of Nursing, University of Utah. Mollie Rebecca Cummins is a professor, College of Nursing, University of Utah.
  • David B Price
  • Syed Ahmar Shah
    Asthma UK Center for Applied Research, Usher Institute, University of Edinburgh, Edinburgh, United Kingdom.