Issue of Data Imbalance on Low Birthweight Baby Outcomes Prediction and Associated Risk Factors Identification: Establishment of Benchmarking Key Machine Learning Models With Data Rebalancing Strategies.

Journal: Journal of medical Internet research
Published Date:

Abstract

BACKGROUND: Low birthweight (LBW) is a leading cause of neonatal mortality in the United States and a major causative factor of adverse health effects in newborns. Identifying high-risk patients early in prenatal care is crucial to preventing adverse outcomes. Previous studies have proposed various machine learning (ML) models for LBW prediction task, but they were limited by small and imbalanced data sets. Some authors attempted to address this through different data rebalancing methods. However, most of their reported performances did not reflect the models' actual performance in real-life scenarios. To date, few studies have successfully benchmarked the performance of ML models in maternal health; thus, it is critical to establish benchmarks to advance ML use to subsequently improve birth outcomes.

Authors

  • Yang Ren
    Department of Computer Science, University of South Carolina, Columbia, SC, United States.
  • Dezhi Wu
    UofSC Big Data Health Science Center (BDHSC), University of South Carolina, Columbia, SC, United States.
  • Yan Tong
    Department of Ophthalmology and Visual Sciences, The Chinese University of Hong Kong, Hong Kong, Hong Kong SAR, China.
  • Ana López-DeFede
    The Institute of Families in Society, University of South Carolina, Columbia, SC, United States.
  • Sarah Gareau
    The Institute of Families in Society, University of South Carolina, Columbia, SC, United States.