Building and Validating an Explainable Machine Learning Model for Predicting Health-Promoting Behaviors in Older Adults: A Multicenter Study.
Journal:
Prevention science : the official journal of the Society for Prevention Research
Published Date:
Mar 24, 2026
Abstract
Enhancing health-promoting behaviors (HPBs) in older adults is crucial for chronic disease management and healthy aging in the context of population aging. Accurate assessment of individual HPB levels can facilitate the development of personalized interventions. This study aimed to identify factors influencing HPBs in older adults using multicenter data and to develop and validate an interpretable machine learning (ML) model for prediction. We conducted a multicenter cross-sectional study among 781 older adults in Shanghai, Jiangsu, and Shandong from June 2024 to September 2025. The collected data included sociodemographic characteristics, health status, community sports facility conditions, mobile phone proficiency, and internet skills. Data from the Shanghai (n = 319) and Shandong (n = 228) centers formed the training set, and data from the Jiangsu center (n = 234) constituted the independent external test set. Model discrimination was evaluated using the area under the receiver operating characteristic curve (AUC), accuracy, specificity, positive and negative predictive value (PPV, NPV), recall, and F1-score. Calibration was assessed with the Hosmer-Lemeshow test and Brier score, and clinical utility was evaluated via decision curve analysis (DCA). The mean age of participants was 61.79 ± 11.54 years. Based on HPB levels, 436 (55.8%) participants were categorized into the HPB group and 345 (44.2%) into the no HPB group. On the external test set, the Stochastic Gradient Boosting Trees (SGBT) model demonstrated optimal performance, with an area under the curve (AUC) of 0.891 (95% CI, 0.848-0.951), excellent calibration (Brier score = 0.103), and a calibration curve closely aligned with the ideal line. Additional metrics included accuracy (0.895), specificity (0.867), PPV (0.897), NPV (0.892), recall (0.917), and F1-score (0.907). DCA indicated a high net clinical benefit across a wide probability threshold range (0-0.6). SHAP analysis elucidated the contribution of each feature, and a user-friendly online prediction platform was deployed. We developed a high-performance, interpretable ML model to predict HPBs in older adults, and systematically identified key predictors such as internet proficiency, educational level, and functional independence. This tool can assist healthcare professionals in rapidly assessing HPB levels, facilitating the precise delivery of health information and services.
Authors
Keywords
No keywords available for this article.