Explainable machine learning framework for biomarker discovery by combining biological age and frailty prediction.
Journal:
Scientific reports
PMID:
40263505
Abstract
Biological age (BA) and frailty represent two distinct health measures that offer valuable insights into the aging process. Comparing and analyzing blood-based biomarkers from the machine learning (ML) predictors of BA and frailty helps deepen our understanding of aging. This study aimed to develop a novel framework to identify biomarkers of aging by combining BA and frailty ML predictors with eXplainable Artificial Intelligence (XAI) techniques. We utilized data from middle-aged and older Chinese adults (≥ 45 years) in the 2011/2012 wave (n = 9702) and the 2015/2016 wave (n = 9455, as test set validation) of the China Health and Retirement Longitudinal Study (CHARLS). Sixteen blood-based biomarkers were used to predict BA and frailty. Four tree-based ML algorithms were employed in the training and validation, and performance metrics were compared to select the best models. Then, SHapley Additive exPlanations (SHAP) analysis was conducted on the selected models. CatBoost performed the best in the BA predictor, and Gradient Boosting performed the best in the frailty predictor. Traditional ML feature importance identified cystatin C and glycated hemoglobin as the major contributors for their respective models. However, subsequent SHAP analysis demonstrated that only cystatin C was the primary contributor in both models. The proposed framework can easily incorporate additional biomarkers, providing a scalable and comprehensive toolset that offers a quantitative understanding of biomarkers of aging.