Tlalpan 2020 Case Study: Enhancing Uric Acid Level Prediction with Machine Learning Regression and Cross-Feature Selection.

Journal: Nutrients
PMID:

Abstract

Uric acid is a key metabolic byproduct of purine degradation and plays a dual role in human health. At physiological levels, it acts as an antioxidant, protecting against oxidative stress. However, excessive uric acid can lead to hyperuricemia, contributing to conditions like gout, kidney stones, and cardiovascular diseases. Emerging evidence also links elevated uric acid levels with metabolic disorders, including hypertension and insulin resistance. Understanding its regulation is crucial for preventing associated health complications. This study, part of the Tlalpan 2020 project, aimed to predict uric acid levels using advanced machine learning algorithms. The dataset included clinical, anthropometric, lifestyle, and nutritional characteristics from a cohort in Mexico City. We applied Boosted Decision Trees (Boosted DTR), eXtreme Gradient Boosting (XGBoost), Categorical Boosting (CatBoost), and Shapley Additive Explanations (SHAP) to identify the most relevant variables associated with hyperuricemia. Feature engineering techniques improved model performance, evaluated using Mean Squared Error (MSE), Root-Mean-Square Error (RMSE), and the coefficient of determination (R). Our study showed that XGBoost had the highest accuracy for anthropometric and clinical predictors, while CatBoost was the most effective at identifying nutritional risk factors. Distinct predictive profiles were observed between men and women. In men, uric acid levels were primarily influenced by renal function markers, lipid profiles, and hereditary predisposition to hyperuricemia, particularly paternal gout and diabetes. Diets rich in processed meats, high-fructose foods, and sugary drinks showed stronger associations with elevated uric acid levels. In women, metabolic and cardiovascular markers, family history of metabolic disorders, and lifestyle factors such as passive smoking and sleep quality were the main contributors. Additionally, while carbohydrate intake was more strongly associated with uric acid levels in women, fructose and sugary beverages had a greater impact in men. To enhance model robustness, a cross-feature selection approach was applied, integrating top features from multiple models, which further improved predictive accuracy, particularly in gender-specific analyses. These findings provide insights into the metabolic, nutritional characteristics, and lifestyle determinants of uric acid levels, supporting targeted public health strategies for hyperuricemia prevention.

Authors

  • Guadalupe Gutiérrez-Esparza
    "Researcher for Mexico" Program under SECIHTI, Secretariat of Sciences, Humanities, Technology, and Innovation, Mexico City 08400, Mexico.
  • Mireya Martínez-García
    Department of Immunology, Instituto Nacional de Cardiología Ignacio Chávez, México City, México.
  • Manlio F Márquez-Murillo
    Division of Diagnostic and Treatment Services, National Institute of Cardiology Ignacio Chávez, Mexico City 04510, Mexico.
  • Malinalli Brianza-Padilla
    Department of Immunology, National Institute of Cardiology Ignacio Chávez, Mexico City 04510, Mexico.
  • Enrique Hernández-Lemus
    Computational Genomics Division, Instituto Nacional de Medicina Genómica, México City, México.
  • Luis M Amezcua-Guerra
    Department of Immunology, Instituto Nacional de Cardiología Ignacio Chávez, México City, México.