Comparative Evaluation of Logistic Regression and Gradient Boosting Models for Influenza Outbreak Early-Warning Using U.S. CDC ILINet Surveillance Data (2010-2025)

Journal: medRxiv
Published Date:

Abstract

Background Timely detection of seasonal influenza outbreaks is critical for healthcare system preparedness and public health response. Although numerous studies have examined short-term influenza forecasting, fewer have operationalized prediction as a binary early-warning problem linked to actionable surveillance thresholds. This study evaluated the performance of traditional and machine learning models for detecting national influenza outbreak weeks using U.S. Centers for Disease Control and Prevention (CDC) ILINet surveillance data. Methods Weekly national ILINet data from 2010-2025 were analyzed. Outbreak weeks were defined as those in which weighted influenza-like illness (ILIPERCENT) exceeded the 90th percentile of the 2010-2017 training distribution (threshold = 3.3932%). Predictors included three-week lags of ILIPERCENT and percent positive laboratory specimens, along with seasonal harmonic terms. Models were trained on 2010-2017 data and evaluated on a temporally held-out 2020-2025 test period. Performance metrics included area under the receiver operating characteristic curve (AUC), precision-recall area under the curve (PR-AUC), sensitivity, specificity, precision, and F1-score. Findings On the 2020-2025 test set, logistic regression achieved an AUC of 0.9964 and PR-AUC of 0.9868, with sensitivity of 1.0000 and specificity of 0.9516. XGBoost achieved an AUC of 0.9946 and PR-AUC of 0.9812, with sensitivity of 0.8939 and specificity of 0.9798. Both models demonstrated near-perfect discrimination between outbreak and non-outbreak weeks under strict temporal validation. Interpretation National influenza outbreak early-warning can be implemented using publicly available CDC surveillance data with high discriminatory accuracy. Framing prediction as a threshold-based outbreak detection problem strengthens operational relevance and supports integration of predictive analytics into routine influenza surveillance and preparedness planning.

Authors

  • Onwuameze
  • C. N.; Madu
  • V.