Machine learning-based prediction models for renal impairment in Chinese adults with hyperuricaemia: risk factor analysis.
Journal:
Scientific reports
PMID:
40089508
Abstract
In hyperuricaemic populations, multiple factors may contribute to impaired renal function. This study aimed to establish a machine learning-based model to identify characteristic factors related to renal impairment in hyperuricaemic patients, determine dose‒response relationships, and facilitate early intervention strategies. Data were collected through the big data platform of Nanjing Hospital of Traditional Chinese Medicine, encompassing 2,705 patients with hyperuricaemia (1,577 with renal impairment, 828 without) from June 2019 to June 2022. After multiple imputations for missing values, the dataset was randomly split into training (70%) and validation (30%) sets. We employed three machine learning algorithms for feature selection: random forest (with 100 decision trees and an OOB error rate of 23.34%), LASSO regression (optimal lambda of -3.58), and XGBoost (learning rate of 0.3, maximum tree depth of 1, and 50 rounds of boosting). The intersection of features identified by these algorithms through Venn diagram analysis yielded four key predictors. A logistic regression model was subsequently constructed and evaluated for discrimination (AUC), calibration (Brier score), and clinical utility (DCA). Restricted cubic spline (RCS) curves were utilized to analyse the dose‒response relationships. The model, which incorporates age, cystatin C (Cys-C), uric acid (UA), and sex, demonstrated robust performance, with an AUC of 0.818 [95% CI (0.796-0.817)] in the training set and an AUC of 0.82 [95% CI (0.787-0.853)] in the validation set. Calibration tests yielded Brier scores of 0.160 and 0.158, respectively. Clinical decision curves revealed optimal prediction probability intervals of 6-99.02% and 7-93.14%. In the hyperuricaemic population, each 0.5 mg/L increase in Cys-C, 10-year increase in age, and 100 µmol/L increase in UA corresponded to increased risks of 13%, 81%, and 73%, respectively. RCS analysis revealed nonlinear relationships for Age and Cys-C and a linear relationship for UA, with sex-specific distribution patterns. The machine learning-based model incorporating these four indicators demonstrated excellent predictive performance for renal impairment in hyperuricaemic patients. These findings suggest that monitoring Cys-C and UA levels while considering age and sex differences is crucial for risk assessment and prevention strategies.