Machine learning approaches for real-time ZIP code and county-level estimation of state-wide infectious disease hospitalizations using local health system data.

Journal: Epidemics
Published Date:

Abstract

The lack of conventional methods of estimating real-time infectious disease burden in granular regions inhibits timely and efficient public health response. Comprehensive data sources (e.g., state health department data) typically needed for such estimation are often limited due to 1) substantial delays in data reporting and 2) lack of geographic granularity in data provided to researchers. Leveraging real-time local health system data presents an opportunity to overcome these challenges. This study evaluates the effectiveness of machine learning and statistical approaches using local health system data to estimate current and previous COVID-19 hospitalizations in South Carolina. Random Forest models demonstrated consistently higher average median percent agreement accuracy compared to generalized linear mixed models for current weekly hospitalizations across 123 ZIP codes (72.29 %, IQR: 63.20-75.62 %) and 28 counties (76.43 %, IQR: 70.33-81.16 %) with sufficient health system coverage. To account for underrepresented populations in health systems, we combined Random Forest models with Classification and Regression Trees (CART) for imputation. The average median percent agreement was 61.02 % (IQR: 51.17-72.29 %) for all ZIP codes and 72.64 % (IQR: 66.13-77.69 %) for all counties. Median percent agreement for cumulative hospitalizations over the previous 6 months was 80.98 % (IQR: 68.99-89.66 %) for all ZIP codes and 81.17 % (IQR: 68.55-91.33 %) for all counties. These findings emphasize the effectiveness of utilizing real-time health system data to estimate infectious disease burden. Moreover, the methodologies developed in this study can be adapted to estimate hospitalizations for other diseases, offering a valuable tool for public health officials to respond swiftly and effectively to various health crises.

Authors

  • Tanvir Ahammed
    Department of Public Health Sciences, Clemson University, Clemson, SC, USA; Center for Public Health Modeling and Response, Clemson University, Clemson, SC, USA.
  • Md Sakhawat Hossain
    School of Mechanical, Aerospace, and Materials Engineering, Southern Illinois University Carbondale, Carbondale, IL 62901, USA.
  • Christopher McMahan
    Center for Public Health Modeling and Response, Clemson University, Clemson, SC, USA; School of Mathematical and Statistical Sciences, Clemson University, Clemson, SC, USA.
  • Lior Rennert
    Department of Public Health Sciences, College of Behavioral, Social, and Health Sciences, Clemson University, Clemson, SC, USA.