Network-based machine learning reveals cardiometabolic multimorbidity patterns and modifiable lifestyle factors: a community-focused analysis of NHANES 2015-2018.
Journal:
BMC public health
Published Date:
Jul 3, 2025
Abstract
Cardiometabolic Multimorbidity (CMM) has emerged as one of the primary threats to human health globally due to its high incidence, disability, and mortality rates. Accurate identification of CMM patterns is crucial for CMM classification and health management. However, current research on CMM pattern recognition often neglects the complex relationships among its influencing factors. Based on data from the National Health and Nutrition Examination Survey (NHANES) between 2015 and 2018, this study included 2,306 participants with an average age of 51 years, who suffered from at least two of the following conditions: hypertension, dyslipidemia, diabetes, chronic kidney disease (CKD), and hyperuricemia. By collecting demographic information, lifestyle indicators, biochemical indicators, and other characteristics of the patients, a CMM graph network was constructed with diseases as nodes and cosine similarity as the basis for calculation. The Louvain algorithm was used to divide the CMM graph network into communities to obtain CMM patterns. Six machine learning models (RandomForest, GradientBoosting, SVM, KNN, Logistic Regression, and XGBoost) were trained using these patterns as labels to identify key factors influencing CMM patterns This study identified four CMM patterns: Hypertension Predominant Group (HPG, Pattern I), Uric Acid and Dyslipidemia Coexistence Group (UADCG, Pattern II), Multiple Diseases High Group (MDHG, Pattern III), and Kidney Disease Low Group (KDLG, Pattern IV) (Modularity = 0.748). The distribution differences of these CMM patterns among gender, age, marital status, education level, and Family Poverty-to-Income Ratio (PIR) were statistically significant (P < 0.05), and so were the differences in lifestyle distribution among the four CMM patterns (P < 0.05). Specifically, patients in the HPG (Pattern I) pattern generally had higher nutrient intake, while those in the KDLG (Pattern IV) pattern had relatively lower intake (P < 0.05). Among the machine learning algorithms, Logistic Regression exhibited the best performance, with an Accuracy of 0.954 and an AUC-ROC area of 0.998. This study used Louvain and machine learning algorithm for CMM pattern detection. The features playing key roles in CMM pattern recognition included choline, iron, niacin, cholesterol, Vitamin B2 and potassium intake, which can serve as references for CMM health management.